Computer vision has been studied for more than 40 years. Due to the increasingly diverse and rapidly developed topics in vision and the related fields (e.g., machine learning, signal processing, cognitive science), the tasks to come up with new research ideas are usually daunting for junior graduate students in this field. In this talk, I will present five methods to come up with new research ideas. For each method, I will give several examples (i.e., existing works in the literature) to illustrate how the method works in practice.
This is a common sense talk and will not have complicated math equations and theories.
Note: The content of this talk is inspired by "Raskar Idea Hexagon" - Prof. Ramesh Raskar's talk on "How to come up with new Ideas".
To download the presentation slide with videos, please visit
http://jbhuang0604.blogspot.com/2010/05/how-to-come-up-with-new-research-ideas.html
For the video lecture (in Chinese), please visit
http://jbhuang0604.blogspot.com/2010/06/blog-post_14.html
Presentation for the Softskills Seminar course @ Telecom ParisTech. Topic is the paper by Domings Hulten "Mining high speed data streams". Presented by me the 30/11/2017
This slideshow provides an overview for best practices for visual analysis within Tableau. This is intended for anyone who wants to tell more compelling stories with their data.
情報システム障害解析のための知識グラフ構築の試み / Constructing a knowledge graph for information sys...Shinji Takao
人工知能学会 第25回知識流通ネットワーク研究会発表 http://sigksn.html.xdomain.jp/conf25/index.html
システム障害解析に関する専門家知識の抽出、グラフ化、DB化を行った際得られた知見と、知識流通手段としての知識グラフの可能性と課題を考察した結果を報告します。
Knowledge graphs have been getting attention because of its relevance to interpretable AI. Not only that, they also can be useful as a knowledge sharing mean which enable non-experts to utilize experts’ knowledge. We aim to report findings from constructing a knowledge graph through eliciting experts’ knowledge and building a knowledge database. We also suggest the possibilities and issues of knowledge graph as a knowledge sharing mean.
NVIDIA compute GPUs and software toolkits are key drivers behind major advancements in machine learning. Of particular interest is a technique called "deep learning", which utilizes what are known as Convolution Neural Networks (CNNs) having landslide success in computer vision and widespread adoption in a variety of fields such as autonomous vehicles, cyber security, and healthcare. In this talk is presented a high level introduction to deep learning where we discuss core concepts, success stories, and relevant use cases. Additionally, we will provide an overview of essential frameworks and workflows for deep learning. Finally, we explore emerging domains for GPU computing such as large-scale graph analytics, in-memory databases.
https://tech.rakuten.co.jp/
Presentation for the Softskills Seminar course @ Telecom ParisTech. Topic is the paper by Domings Hulten "Mining high speed data streams". Presented by me the 30/11/2017
This slideshow provides an overview for best practices for visual analysis within Tableau. This is intended for anyone who wants to tell more compelling stories with their data.
情報システム障害解析のための知識グラフ構築の試み / Constructing a knowledge graph for information sys...Shinji Takao
人工知能学会 第25回知識流通ネットワーク研究会発表 http://sigksn.html.xdomain.jp/conf25/index.html
システム障害解析に関する専門家知識の抽出、グラフ化、DB化を行った際得られた知見と、知識流通手段としての知識グラフの可能性と課題を考察した結果を報告します。
Knowledge graphs have been getting attention because of its relevance to interpretable AI. Not only that, they also can be useful as a knowledge sharing mean which enable non-experts to utilize experts’ knowledge. We aim to report findings from constructing a knowledge graph through eliciting experts’ knowledge and building a knowledge database. We also suggest the possibilities and issues of knowledge graph as a knowledge sharing mean.
NVIDIA compute GPUs and software toolkits are key drivers behind major advancements in machine learning. Of particular interest is a technique called "deep learning", which utilizes what are known as Convolution Neural Networks (CNNs) having landslide success in computer vision and widespread adoption in a variety of fields such as autonomous vehicles, cyber security, and healthcare. In this talk is presented a high level introduction to deep learning where we discuss core concepts, success stories, and relevant use cases. Additionally, we will provide an overview of essential frameworks and workflows for deep learning. Finally, we explore emerging domains for GPU computing such as large-scale graph analytics, in-memory databases.
https://tech.rakuten.co.jp/
Presentation at Data ScienceTech Institute campuses, Paris and Nice, May 2016 , including Intro, Data Science History and Terms; 10 Real-World Data Science Lessons; Data Science Now: Polls & Trends; Data Science Roles; Data Science Job Trends; and Data Science Future
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
이번 논문은 요즘 핫한 Diffusion을 처음으로 유행시킨 Denoising Diffusion Probabilistic Models (DDPM) 입니다. ICML 2015년에 처음 제안된 Diffusion의 여러 실용적인 측면들을 멋지게 해결하여 그 유행의 시작을 알린 논문인데요, Generative Model의 여러 분야와 Diffusion, 그리고 DDPM에서는 무엇이 바뀌었는지 알아보도록 하겠습니다.
논문 링크: https://arxiv.org/abs/2006.11239
영상 링크: https://youtu.be/1j0W_lu55nc
Lecture1 introduction to machine learningUmmeSalmaM1
Machine Learning is a field of computer science which deals with the study of computer algorithms that improve automatically through experience. In this PPT we discuss the following concepts - Prerequisite, Definition, Introduction to Machine Learning (ML), Fields associated with ML, Need for ML, Difference between Artificial Intelligence, Machine Learning, Deep Learning, Types of learning in ML, Applications of ML, Limitations of Machine Learning.
Reading academic papers is one of the most important parts of scientific research. However, junior graduate students may spend a lot of time learning how to read papers efficiently and effectively. In this talk, I will discuss some basic issues and introduce useful websites/tools/tips for paper reading.
Research 101 - Paper Writing with LaTeXJia-Bin Huang
Paper Writing with LaTeX
PDF: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pdf
PPTX: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pptx
Presentation at Data ScienceTech Institute campuses, Paris and Nice, May 2016 , including Intro, Data Science History and Terms; 10 Real-World Data Science Lessons; Data Science Now: Polls & Trends; Data Science Roles; Data Science Job Trends; and Data Science Future
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
이번 논문은 요즘 핫한 Diffusion을 처음으로 유행시킨 Denoising Diffusion Probabilistic Models (DDPM) 입니다. ICML 2015년에 처음 제안된 Diffusion의 여러 실용적인 측면들을 멋지게 해결하여 그 유행의 시작을 알린 논문인데요, Generative Model의 여러 분야와 Diffusion, 그리고 DDPM에서는 무엇이 바뀌었는지 알아보도록 하겠습니다.
논문 링크: https://arxiv.org/abs/2006.11239
영상 링크: https://youtu.be/1j0W_lu55nc
Lecture1 introduction to machine learningUmmeSalmaM1
Machine Learning is a field of computer science which deals with the study of computer algorithms that improve automatically through experience. In this PPT we discuss the following concepts - Prerequisite, Definition, Introduction to Machine Learning (ML), Fields associated with ML, Need for ML, Difference between Artificial Intelligence, Machine Learning, Deep Learning, Types of learning in ML, Applications of ML, Limitations of Machine Learning.
Reading academic papers is one of the most important parts of scientific research. However, junior graduate students may spend a lot of time learning how to read papers efficiently and effectively. In this talk, I will discuss some basic issues and introduce useful websites/tools/tips for paper reading.
Research 101 - Paper Writing with LaTeXJia-Bin Huang
Paper Writing with LaTeX
PDF: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pdf
PPTX: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pptx
What makes a creative photograph? This talk summarizes five approaches to make creative photographs. For each approach, many example images from the internet are used to demonstrate how the method works in practice.
For more explanations on example images, please visit my blog: http://jbhuang0604.blogspot.com/
General principles and tricks for writing fast MATLAB code.
Powerpoint slides: https://uofi.box.com/shared/static/yg4ry6s1c9qamsvk6sk7cdbzbmn2z7b8.pptx
Computer vision techniques can be seen in various aspects in our daily life with tremendous impacts. This slides aim at introducing basic concepts of computer vision and applications for the general public.
Download link: https://uofi.box.com/shared/static/24vy7aule67o4g6djr83hzurf5a9lfp6.pptx
Here is my updated CV using the ModernCV template (http://www.latextemplates.com/template/moderncv-cv-and-cover-letter).
You can find the Tex source file in (https://dl.dropbox.com/u/2810224/Homepage/resume/modern%20style.rar)
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Jia-Bin Huang
Jia-Bin Huang, Qin Cai, Zicheng Liu, Narendra Ahuja, and Zhengyou Zhang
Towards Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning From Simulation
Proceedings of ACM Symposium on Eye Tracking Research & Applications (ETRA), 2014
ETRA 2014 Best Paper Award
In this paper, we describe a new interactive image completion system that allows users to easily specify various forms of mid-level structures in the image. Our system supports the specification of four basic symmetric types: reflection, translation, rotation, and glide. The user inputs are automatically converted into guidance maps that encode
possible candidate shifts and, indirectly, local transformations of rotation and scale. These guidance maps are used in conjunction with a color matching cost for image
completion. We show that our system is capable of handling a variety of challenging examples.
http://www.jiabinhuang.com/
Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012Jia-Bin Huang
A number of bottom-up saliency detection algorithms have been proposed in the literature. Since these have been developed from intuition and principles inspired by psychophysical studies of human vision, the theoretical relations among them are unclear. In this paper, we present a unifying perspective. Saliency of an image area is defined in terms of divergence between certain feature distributions estimated from the
central part and its surround. We show that various, seemingly different saliency estimation algorithms are in fact closely related. We also discuss some commonly
used center-surround selection strategies. Experiments with two datasets are presented to quantify the relative advantages of these algorithms.
Best student paper award in Computer Vision and Robotics Track
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Jia-Bin Huang
In this paper, we propose a fast re-coloring algorithm to improve the accessibility for the color vision impaired. Compared to people with normal color vision, people with color vision impairment have difficulty in distinguishing between certain combinations of colors. This may hinder visual communication owing to the increasing use of colors in recent years. To address this problem, we re-map the hue components in the HSV color space based on the statistics of local characteristics of the original color image. We enhance the color contrast through generalized histogram equalization. A control parameter is provided for various users to specify the degree of enhancement to meet their needs. Experimental results are illustrated to demonstrate the effectiveness and efficiency of the proposed re-coloring algorithm.
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Jia-Bin Huang
We propose a method for automatically guiding patch-based image completion using mid-level structural cues. Our method first estimates planar projection parameters, softly segments the known region into planes, and discovers translational regularity within these planes. This information is then converted into soft constraints for the low-level completion algorithm by defining prior probabilities for patch offsets and transformations. Our method handles multiple planes, and in the absence of any detected planes falls back to a baseline fronto-parallel image completion algorithm. We validate our technique through extensive comparisons with state-of-the-art algorithms on a variety of scenes.
Project page: https://sites.google.com/site/jbhuang0604/publications/struct_completion
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l1-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.
Course: Intro to Computer Science (Malmö Högskola):
knowledge representation and abstraction, decision making, generalization, data acquistion (abstraction), machine learning, similarity
another version of abstraction
Supervised learning is a category of machine learning that uses labeled datasets to train algorithms to predict outcomes and recognize patterns. Unlike unsupervised learning, supervised learning algorithms are given labeled training to learn the relationship between the input and the outputs.
Supervised machine learning algorithms make it easier for organizations to create complex models that can make accurate predictions. As a result, they are widely used across various industries and fields, including healthcare, marketing, financial services, and more.
Here, we’ll cover the fundamentals of supervised learning in AI, how supervised learning algorithms work, and some of its most common use cases.
Get started for free
How does supervised learning work?
The data used in supervised learning is labeled — meaning that it contains examples of both inputs (called features) and correct outputs (labels). The algorithms analyze a large dataset of these training pairs to infer what a desired output value would be when asked to make a prediction on new data.
For instance, let’s pretend you want to teach a model to identify pictures of trees. You provide a labeled dataset that contains many different examples of types of trees and the names of each species. You let the algorithm try to define what set of characteristics belongs to each tree based on the labeled outputs. You can then test the model by showing it a tree picture and asking it to guess what species it is. If the model provides an incorrect answer, you can continue training it and adjusting its parameters with more examples to improve its accuracy and minimize errors.
Once the model has been trained and tested, you can use it to make predictions on unknown data based on the previous knowledge it has learned.
How does supervised learning work?
The data used in supervised learning is labeled — meaning that it contains examples of both inputs (called features) and correct outputs (labels). The algorithms analyze a large dataset of these training pairs to infer what a desired output value would be when asked to make a prediction on new data.
For instance, let’s pretend you want to teach a model to identify pictures of trees. You provide a labeled dataset that contains many different examples of types of trees and the names of each species. You let the algorithm try to define what set of characteristics belongs to each tree based on the labeled outputs. You can then test the model by showing it a tree picture and asking it to guess what species it is. If the model provides an incorrect answer, you can continue training it and adjusting its parameters with more examples to improve its accuracy and minimize errors.
Once the model has been trained and tested, you can use it to make predictions on unknown data based on the previous knowledge it has learned.
Types of supervised learning
Supervised learning in machine learning is generally divided into two categories: classification and regre
Transfer Learning for Natural Language ProcessingSebastian Ruder
Slides on Transfer Learning for Natural Language Processing by Sebastian Ruder. Talk given at Natural Language Processing Copenhagen Meetup on 31 May 2017.
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Jia-Bin Huang
Self-similarity based super-resolution (SR) algorithms are able to produce visually pleasing results without extensive training on external databases. Such algorithms exploit the statistical prior that patches in a natural image tend to recur within and across scales of the same image. However, the internal dictionary obtained from the given image may not always be sufficiently expressive to cover the textural appearance variations in the scene. In this paper, we extend self-similarity based SR to overcome this drawback. We expand the internal patch search space by allowing geometric variations. We do so by explicitly localizing planes in the scene and using the detected perspective geometry to guide the patch search process. We also incorporate additional affine transformations to accommodate local shape variations. We propose a compositional model to simultaneously handle both types of transformations. We extensively evaluate the performance in both urban and natural scenes. Even without using any external training databases, we achieve significantly superior results on urban scenes, while maintaining comparable performance on natural scenes as other state-of-the-art SR algorithms.
http://bit.ly/selfexemplarsr
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l1-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
1. How to Come Up With New
Research Ideas?
Jia-Bin Huang
jbhuang0604@gmail.com
Taiwan
May , 2010
1 / 94
2. What this talk is about?
Five approaches to come up with new ideas in computer vision.
Extensive case studies (i.e., more than one hundred papers).
A common sense talk. No complicate theories or equations.
I wish someone told me this before.
Reference
The content of this talk is greatly inspired by “Raskar Idea
Hexagon".
2 / 94
3. What this talk is about?
Five approaches to come up with new ideas in computer vision.
Extensive case studies (i.e., more than one hundred papers).
A common sense talk. No complicate theories or equations.
I wish someone told me this before.
Reference
The content of this talk is greatly inspired by “Raskar Idea
Hexagon".
2 / 94
4. What this talk is about?
Five approaches to come up with new ideas in computer vision.
Extensive case studies (i.e., more than one hundred papers).
A common sense talk. No complicate theories or equations.
I wish someone told me this before.
Reference
The content of this talk is greatly inspired by “Raskar Idea
Hexagon".
2 / 94
5. Outline
1 Introduction
2 Five ways to come up with new ideas
Seek different dimensions neXt = X d
Combine two or more topics neXt = X + Y
Re-think the research directions ¯
neXt = X
Use powerful tools, find suitable problems neXt = X ↑
Add an appropriate adjective neXt = Adj + X
3 What is a bad idea?
3 / 94
6. Outline
1 Introduction
2 Five ways to come up with new ideas
Seek different dimensions neXt = X d
Combine two or more topics neXt = X + Y
Re-think the research directions ¯
neXt = X
Use powerful tools, find suitable problems neXt = X ↑
Add an appropriate adjective neXt = Adj + X
3 What is a bad idea?
4 / 94
7. Active Topics in Computer Vision
[Szeliski Computer Vision: Algorithms and Applications 2010]
Digital image processing Blocks world, line labeling
Generalized cylinders Pictorial structures
Stereo correspondence Intrinsic images
Optical flow Structure from motion
Image pyramids Scale-space processing
Shape from X Physically-based modeling
Regularization Markov Random Fields
Kalman filters 3D range data processing
Projective invariants Factorization
Physics-based vision Graph cuts
Particle filtering Energy-based segmentation
Face recognition and detection Subspace methods
Image-based modeling/rendering Texture synthesis/inpainting
Computational photography Feature-based recognition
MRF inference algorithms Learning
5 / 94
8. What can we learn from the past?
The topics are diverse and evolve over time.
The ways to come up with new ideas are similar. There are
patterns to follow.
6 / 94
9. Outline
1 Introduction
2 Five ways to come up with new ideas
Seek different dimensions neXt = X d
Combine two or more topics neXt = X + Y
Re-think the research directions ¯
neXt = X
Use powerful tools, find suitable problems neXt = X ↑
Add an appropriate adjective neXt = Adj + X
3 What is a bad idea?
7 / 94
10. Outline
1 Introduction
2 Five ways to come up with new ideas
Seek different dimensions neXt = X d
Combine two or more topics neXt = X + Y
Re-think the research directions ¯
neXt = X
Use powerful tools, find suitable problems neXt = X ↑
Add an appropriate adjective neXt = Adj + X
3 What is a bad idea?
8 / 94
11. Seek different dimensions neXt = X d
The only difference between a rut
and a grave is their dimensions. -
Ellen Glasgow
9 / 94
12. Seek different dimensions neXt = X d
Idea
Can we increase/replace/transform the dimensions of the original
problem to get new problems/solutions?
What kind of dimensions can we work on?
1 Concrete dimensions (e.g., space, time, frequency)
2 Abstract dimensions (e.g., properties)
10 / 94
13. EX 1-1. Content-Aware Media Resizing
[Avidan et al. SIGGRAPH 07] [Rubinstein et al. SIGGRAPH 08]
Ideas
Extend dimensions from 2D image to 3D video: image re-targeting
⇒ video re-targeting
Other dimensions? E.g., 4D light field, infrared image, range
image.
11 / 94
14. EX 1-2. Video Stitching
[Rav-Acha et al. CVPR 05]
Input video Dynamic Panorama
Ideas
Extend dimensions from image to video, i.e., Image Panorama ⇒
Video Mosaics with Non-Chronological Time
Increase the time dimension in both input and output
12 / 94
15. EX 1-3. Multi-Image Fusion
[Agarwala et al. SIGGRAPH 04]
Ideas
Extend from single input image to multiple input images ⇒ Digital
Photomontage
Increase the dimension in input only.
13 / 94
16. EX 1-4. Computation Photography (Coded
Photography)
[Raskar et al. SIGGRAPH 04, 06, 08] [Levin et al. SIGGRAPH 07]
Ideas
Coded Photography: reversibly encode information about the
scene in a single photograph
Coding in Time (Exposure), Coded Illumination, Coding in Space
(aperture), and Coded Wavelength
Replace the dimension to code information of the light field
14 / 94
17. EX 1-1. Photography in Low Light Conditions
Flash Blurred Noisy
What we can do ?
Flash → Changes the overall scene appearance (cold and gray)
Long exposure time (hand shake) → Blurred image
Short exposure time (insufficient light) → Noisy image
15 / 94
18. EX 1-1-1. Flash/non-Flash Photography
[Petschnigg et al. SIGGRAPH 2004]
Flash No flash Detail transfer with denoising
Ideas
The original problem (taking a good photo in low light
environments from single image) is difficult.
Increase the dimension of input (flash/no-flash image pair) make
the problem much easier.
16 / 94
19. EX 1-1-2. Image Deblurring with Blurred/Noisy Image
Pairs
[Yuan et al. SIGGRAPH 2007]
Blurred Noisy Enhanced noisy Deblurred result
Ideas
The original problem (taking a good photo in low light and flash
prohibited environments from single image) is difficult.
Increase the dimension of input (Blurred/Noisy image pair) make
the problem much easier.
17 / 94
20. EX 1-1-3. Robust Flash Deblurring
[Zhou et al. CVPR 2010]
Ideas
The original problem (taking a good photo in low light
environments from single image) is difficult.
Increase the dimension of input (Blurred/Flash image pair) make
the problem much easier.
18 / 94
21. EX 1-1-4. Dark Flash Photography
[Krishnan et al. SIGGRAPH 2009]
Ideas
The original problem (taking a good photo in low light
environments from single image) is difficult.
Increase the dimension of input (Dark Flash/Noisy image pair)
make the problem much easier.
19 / 94
22. EX 1-2. Brute-Force Vision
[Hays and Efros SIGGRAPH 07] [Dale et al. ICCV 09] [Agarwal et al. ICCV 09]
[Furukawa et al. ICCV 09]
Ideas
Utilize a large collection of photos.
20 / 94
23. EX 2-1. X Alignment/Registration (pixel, object, scene)
[Liu et al. CVPR 08, ECCV 08] [Berg et al. CVPR 05]
21 / 94
24. EX 2-2. Shape from X (shading, texture, specular)
[Lobay and Forsyth IJCV 06] [Fleming et al JOV 04] [Adato et al ICCV 07]
shading specular
texture specular flow
22 / 94
25. EX 2-3. Depth from X (stereo, (de-)focus, coded
aperture, diffusion, occlusion, semantic label)
[Levin et al. SIGGRAPH 07] [Hoiem et al. ICCV 07] [Liu et al. CVPR 10] [Zhou et al.
CVPR 10]
Coded Aperture Semantic Labels
Occlusion Diffusion
23 / 94
26. EX 2-4. Infer X from a single image (geometric,
geography, illumination)
[Hoiem et al. ICCV 05] [Hays and Efros CVPR 08] [Lalonde et al. ICCV 09]
Geometric
Geography
Illumination
24 / 94
27. Outline
1 Introduction
2 Five ways to come up with new ideas
Seek different dimensions neXt = X d
Combine two or more topics neXt = X + Y
Re-think the research directions ¯
neXt = X
Use powerful tools, find suitable problems neXt = X ↑
Add an appropriate adjective neXt = Adj + X
3 What is a bad idea?
25 / 94
28. Combine two or more topics neXt = X + Y
To steal ideas from one person is
plagiarism. To steal from many is
research. - Wilson Mizner
26 / 94
29. Combine two or more topics neXt = X + Y
Idea
Can we combine two or more topics to get new problems or
solutions?
What kind of topics can we combine?
1 X, Y are methods
2 X, Y are problems
3 X, Y are areas
27 / 94
30. EX 1-1. Viola-Jones Object Detection Framework
[Viola and Jones CVPR 2001]
Simple feature Integral img Boosting Cascade structure
Ideas
Paper title: Rapid Object Detection using a Boosted Cascade of
Simple Features
Viola-Jones object detection framework = Integral Images (simple
feature)(1984) + AdaBoost(1997) + Cascade Architecture(long
time ago)
28 / 94
31. EX 1-2. SIFT Flow = SIFT + Optical Flow
[Liu et al. ECCV 08 CVPR 09]
Motion hallucination
Label transfer
Ideas
Dense sampling in time : optical flow :: dense sampling in world
images : SIFT flow
29 / 94
32. EX 1-3. Visual Tracking with Online Multiple Instance
Boosting
[Babenko et al. CVPR 09]
Ideas
MILTrack = Multiple Instance Boosting (2005) + Online Boosting
Tracking (2006)
30 / 94
33. EX 2-1. High Dynamic Range Image Reconstruction
from Hand-held Cameras
[Lu et al. CVPR 2009]
Ideas
HDR from from Hand-held Cameras = High Dynamic Range
Image Reconstruction + Image Deblurring
31 / 94
34. EX 2-2. Human Body Understanding
[Guan et al. ICCV 09]
Ideas
Human Body Understanding = Shape Reconstruction + Pose
Estimation
32 / 94
35. EX 2-3. Image Understanding
detection, tracking, recognition, segmentation, reconstruction, scene classification,
event recognition
33 / 94
36. EX 2-3-1. Detection + Tracking
[Andriluka et al. CVPR 08]
Ideas
People detection and people tracking are highly correlated
problems.
Combine two problems can potentially achieve improved
performance on individual tasks.
34 / 94
37. EX 2-3-2. Object Attribute + Recognition
[Farhadi et al. CVPR 09] [Lampert et al. CVPR 09]
Ideas
Describe image by attributes
Enable knowledge transfer to recognition class with no visual
examples
35 / 94
38. EX 2-3-2. Object Recognition + Detection
[Yeh et al. CVPR 09]
Ideas
Concurrent object localization and recognition
36 / 94
39. EX 2-3-3. Image Segmentation + Object Recognition
+ Event Recognition
[Li et al. CVPR 09]
Ideas
Combine scene classification, image segmentation, image
annotation
All three tasks are mutually beneficial
37 / 94
40. EX 3-1. SixthSense - A Wearable Gestural Interface
[Mistry and Maes TED 2009]
Ideas
SixthSense = Computer Vision (e.g., tracking, recognition) +
Internet
38 / 94
41. EX 3-2. Sikuli:Picture-driven computing
[Yeh et al. UIST 09] [Chang et al. CHI 10]
Ideas
1. Readability/usability, 2. GUI serialization, 3. Computer vision
on computer-generated figures
39 / 94
42. Outline
1 Introduction
2 Five ways to come up with new ideas
Seek different dimensions neXt = X d
Combine two or more topics neXt = X + Y
Re-think the research directions ¯
neXt = X
Use powerful tools, find suitable problems neXt = X ↑
Add an appropriate adjective neXt = Adj + X
3 What is a bad idea?
40 / 94
43. Re-think the research directions ¯
neXt = X
If at first, the idea is not absurd, then
there is no hope for it -
Albert Einstein
41 / 94
44. Re-think the research directions ¯
neXt = X
Ideas
Are the current research directions really make sense? What’s the
key problem?
What could we do?
1 Re-formulate the original problem.
2 Analyze, compare existing approaches. Provide insight to the
problems.
42 / 94
45. EX 1-1. Beyond Sliding Windows
[Lampert et al. CVPR 08]
Rectangle set Branch and bound search
Ideas
Sliding window search ⇔ brand-and-bound search
Represent a set of rectangles with 4 intervals
Use brand-and-bound to find the optimal rectangle (object
localization) efficiently
43 / 94
46. EX 1-2. Beyond Categories
[Malisiewicz and Efros CVPR 08, NIPS 09]
Ideas
Explicit categorization ⇔ Implicit categorization
Ask "what is this like?" (association), instead of "what is it?"
(categorization)
44 / 94
47. EX 1-3. Motion-Invariant Photography
[Levin et al. SIGGRAPH 08] [Cho et al. ICCP 10]
Ideas
Still camera ⇔ Moving camera (parabolic exposures)
Enable the use of spatial-invariant blur kernel estimation
45 / 94
48. EX 1-4. Super-resolution from Single Image
[Glasner et al. ICCV 09]
Ideas
Clasical multi-image SR/Example-based SR ⇔ Single SR
framework
46 / 94
49. EX 2-1. In Defense of ...
[Boiman et al. CVPR 08] [Hartley PAMI 97]
Nearest-Neighbor Based Image Classification
Quantization of local image descriptors (used to generate
"bags-of-words", codebooks).
Computation of "Image-to-Image" distance, instead of
"Image-to-Class" distance
The performance ranks among the top leading learning-based
image classifiers
The 8-point Algorithm for the fundamental matrix
Normalization, Normalization, Normalization!
Performs almost as well as the best iterative algorithm
47 / 94
50. EX 2-2. Understanding blind deconvolution
[Levin et al. CVPR 2009]
Ideas
Blind deconvolution: recover sharp image x from the blurred one
(y = k ⊗ x + n).
MAPx,k estimation often favors no-blur explanations.
MAPk can be accurately estimated since the kernel size is often
smaller than the image size.
Blind deconvolution should be address in this way: MAPk +
non-blind deconvolution.
48 / 94
51. EX 2-3. Understanding camera trade-offs
[Levin et al. ECCV 08]
Ideas
Traditional optics evaluation: 2D image sharpness (eg, Modulation
Transfer Function)
Modern camera evaluation: How well does the recorded data
allow us to estimate the visual world - the lightfield?
49 / 94
52. EX 2-4. What is a good image segment?
[Bagon et al. ECCV 08]
Ideas
Good image segment as one which can be easily composed using
its own pieces, but is difficult to compose using pieces from other
parts of the image
50 / 94
53. EX 2-5. Lambertian Reflectance and Linear
Subspaces
[Basri and Jacobs PAMI 03]
Ideas
The set of all Lambertian reflectance functions (the mapping from
surface normals to intensities) obtained with arbitrary distant light
sources lies close to a 9D linear subspace.
Explain prior empirical results using linear subspace methods.
51 / 94
54. Outline
1 Introduction
2 Five ways to come up with new ideas
Seek different dimensions neXt = X d
Combine two or more topics neXt = X + Y
Re-think the research directions ¯
neXt = X
Use powerful tools, find suitable problems neXt = X ↑
Add an appropriate adjective neXt = Adj + X
3 What is a bad idea?
52 / 94
55. Use powerful tools, find suitable problems neXt = X ↑
If the only tool you have is a hammer,
you tend to see every problem as a
nail. - Abraham Maslow
53 / 94
56. Use powerful tools, find suitable problems neXt = X ↑
What kinds of tools should we understand?
Calculus of Variations
Dimensionality Reduction
Spectral Methods (specifically, spectral clustering)
Probabilistic Graphical Model
Structured Prediction
Bilateral Filtering
Sparse Representation
and more ... spectral method/theory, information theory, (convex)
optimization, etc
54 / 94
57. EX 1. Calculus of Variations (1/2)
From Calculus to Calculus of Variations
Calculus Calculus of Variations
Functions Functionals (functions of functions)
x
f: Rn → R f: F → R, f (u) = x12 L(x, u(x), u (x))dx
(x) df (u)
Derivative dfdx Variation du
lim∆x→0 f (x+∆x)−f (x)
∆x lim →0
f (u+ δx)−f (u) ∂
f (x + ∆u)|
∂ =0
Local extremum Local extremum
df (x)
dx = 0 Euler-Lagrange equation
Total Variation (TV)
x1
TV(y) = x0 |y |dx: The "oscillation strength" of y(x)
55 / 94
58. EX 1. Calculus of Variations (2/2)
Total Variation Denoising/Inpainting
Applications in computer vision
Optical flow [Horn and Schunck AI 81]
Shape from shading [Horn and Brooks CVGIP 86]
Edge detection [PAMI 87]
Anisotropic diffusion [Perona and Malik PAMI 90]
Active contours model [Kass et al. IJCV 98]
Image segmentation [Morel and Solimini 95]
Image restoration [Aubert and Vese SIAM Journal on NA 97] 56 / 94
59. EX 1. Calculus of Variations (2/2)
Total Variation Denoising/Inpainting
Applications in computer vision
Optical flow [Horn and Schunck AI 81]
Shape from shading [Horn and Brooks CVGIP 86]
Edge detection [PAMI 87]
Anisotropic diffusion [Perona and Malik PAMI 90]
Active contours model [Kass et al. IJCV 98]
Image segmentation [Morel and Solimini 95]
Image restoration [Aubert and Vese SIAM Journal on NA 97] 56 / 94
60. EX 2. Dimensionality Reduction (1/2)
Why we need dimensionality reduction?
Since high-dimensional data is everywhere (e.g., images, human gene
distributions, weather prediction), we need dimensionality reduction for
1 processing data efficiently.
2 estimating the distributions of data accuratly (curse of
dimensionality)
3 finding meaningful representation of data
Classification of dimensionality reduction methods
Global structure preserved Local structure preserved
Linear PCA, LDA LPP, NPE
Nonlinear ISOMAP, Kernel PCA, DM LLE, LE, HE
57 / 94
61. EX 2. Dimensionality Reduction (1/2)
Why we need dimensionality reduction?
Since high-dimensional data is everywhere (e.g., images, human gene
distributions, weather prediction), we need dimensionality reduction for
1 processing data efficiently.
2 estimating the distributions of data accuratly (curse of
dimensionality)
3 finding meaningful representation of data
Classification of dimensionality reduction methods
Global structure preserved Local structure preserved
Linear PCA, LDA LPP, NPE
Nonlinear ISOMAP, Kernel PCA, DM LLE, LE, HE
57 / 94
62. EX 2. Dimensionality Reduction (2/2)
Applications in computer vision
Subspace as constraints
Structure from motion [Tomasi and Kanade IJCV 92], Optical flow
[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face
alignment [Saragih et al. ICCV 09]
Face recognition (e.g., PCA, LDA, LPP)
PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],
LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]
Motion segmentation
subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV
06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]
Lighting
linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades
et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]
Visual tracking
incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR
08]
58 / 94
63. EX 2. Dimensionality Reduction (2/2)
Applications in computer vision
Subspace as constraints
Structure from motion [Tomasi and Kanade IJCV 92], Optical flow
[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face
alignment [Saragih et al. ICCV 09]
Face recognition (e.g., PCA, LDA, LPP)
PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],
LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]
Motion segmentation
subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV
06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]
Lighting
linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades
et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]
Visual tracking
incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR
08]
58 / 94
64. EX 2. Dimensionality Reduction (2/2)
Applications in computer vision
Subspace as constraints
Structure from motion [Tomasi and Kanade IJCV 92], Optical flow
[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face
alignment [Saragih et al. ICCV 09]
Face recognition (e.g., PCA, LDA, LPP)
PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],
LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]
Motion segmentation
subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV
06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]
Lighting
linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades
et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]
Visual tracking
incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR
08]
58 / 94
65. EX 2. Dimensionality Reduction (2/2)
Applications in computer vision
Subspace as constraints
Structure from motion [Tomasi and Kanade IJCV 92], Optical flow
[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face
alignment [Saragih et al. ICCV 09]
Face recognition (e.g., PCA, LDA, LPP)
PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],
LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]
Motion segmentation
subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV
06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]
Lighting
linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades
et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]
Visual tracking
incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR
08]
58 / 94
66. EX 2. Dimensionality Reduction (2/2)
Applications in computer vision
Subspace as constraints
Structure from motion [Tomasi and Kanade IJCV 92], Optical flow
[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face
alignment [Saragih et al. ICCV 09]
Face recognition (e.g., PCA, LDA, LPP)
PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],
LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]
Motion segmentation
subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV
06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]
Lighting
linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades
et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]
Visual tracking
incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR
08]
58 / 94
67. EX 3. Spectral Clustering (1/3)
Why spectral clustering is popular?
Can be solved efficiently by standard linear algebra software
Very often outperform traditional clustering algorithms
Spectral clustering algorithm
Input: a set of data points
1 Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,
fully connected
2 Construct graph Laplacian, e.g., (un)normalized (L, Lrw , Lsym )
3 Compute the first k (with smallest eigenvalues) eigenvectors of L,
v1 , · · · , vk
4 Let V ∈ Rn×k be a matrix contains v1 , ·, vk as columns
5 Cluster the row vectors yi with the k-means algorithm into cluster
C1 , · · · , Ck
Output: Clusters A1 , · · · , Ak with Ai = {j|yj ∈ Ci }
59 / 94
68. EX 3. Spectral Clustering (1/3)
Why spectral clustering is popular?
Can be solved efficiently by standard linear algebra software
Very often outperform traditional clustering algorithms
Spectral clustering algorithm
Input: a set of data points
1 Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,
fully connected
2 Construct graph Laplacian, e.g., (un)normalized (L, Lrw , Lsym )
3 Compute the first k (with smallest eigenvalues) eigenvectors of L,
v1 , · · · , vk
4 Let V ∈ Rn×k be a matrix contains v1 , ·, vk as columns
5 Cluster the row vectors yi with the k-means algorithm into cluster
C1 , · · · , Ck
Output: Clusters A1 , · · · , Ak with Ai = {j|yj ∈ Ci }
59 / 94
69. EX 3. Spectral Clustering (1/3)
Why spectral clustering is popular?
Can be solved efficiently by standard linear algebra software
Very often outperform traditional clustering algorithms
Spectral clustering algorithm
Input: a set of data points
1 Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,
fully connected
2 Construct graph Laplacian, e.g., (un)normalized (L, Lrw , Lsym )
3 Compute the first k (with smallest eigenvalues) eigenvectors of L,
v1 , · · · , vk
4 Let V ∈ Rn×k be a matrix contains v1 , ·, vk as columns
5 Cluster the row vectors yi with the k-means algorithm into cluster
C1 , · · · , Ck
Output: Clusters A1 , · · · , Ak with Ai = {j|yj ∈ Ci }
59 / 94
70. EX 3. Spectral Clustering (1/3)
Why spectral clustering is popular?
Can be solved efficiently by standard linear algebra software
Very often outperform traditional clustering algorithms
Spectral clustering algorithm
Input: a set of data points
1 Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,
fully connected
2 Construct graph Laplacian, e.g., (un)normalized (L, Lrw , Lsym )
3 Compute the first k (with smallest eigenvalues) eigenvectors of L,
v1 , · · · , vk
4 Let V ∈ Rn×k be a matrix contains v1 , ·, vk as columns
5 Cluster the row vectors yi with the k-means algorithm into cluster
C1 , · · · , Ck
Output: Clusters A1 , · · · , Ak with Ai = {j|yj ∈ Ci }
59 / 94
71. EX 3. Spectral Clustering (1/3)
Why spectral clustering is popular?
Can be solved efficiently by standard linear algebra software
Very often outperform traditional clustering algorithms
Spectral clustering algorithm
Input: a set of data points
1 Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,
fully connected
2 Construct graph Laplacian, e.g., (un)normalized (L, Lrw , Lsym )
3 Compute the first k (with smallest eigenvalues) eigenvectors of L,
v1 , · · · , vk
4 Let V ∈ Rn×k be a matrix contains v1 , ·, vk as columns
5 Cluster the row vectors yi with the k-means algorithm into cluster
C1 , · · · , Ck
Output: Clusters A1 , · · · , Ak with Ai = {j|yj ∈ Ci }
59 / 94
72. EX 3. Spectral Clustering (1/3)
Why spectral clustering is popular?
Can be solved efficiently by standard linear algebra software
Very often outperform traditional clustering algorithms
Spectral clustering algorithm
Input: a set of data points
1 Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,
fully connected
2 Construct graph Laplacian, e.g., (un)normalized (L, Lrw , Lsym )
3 Compute the first k (with smallest eigenvalues) eigenvectors of L,
v1 , · · · , vk
4 Let V ∈ Rn×k be a matrix contains v1 , ·, vk as columns
5 Cluster the row vectors yi with the k-means algorithm into cluster
C1 , · · · , Ck
Output: Clusters A1 , · · · , Ak with Ai = {j|yj ∈ Ci }
59 / 94
73. EX 3. Spectral Clustering (1/3)
Why spectral clustering is popular?
Can be solved efficiently by standard linear algebra software
Very often outperform traditional clustering algorithms
Spectral clustering algorithm
Input: a set of data points
1 Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,
fully connected
2 Construct graph Laplacian, e.g., (un)normalized (L, Lrw , Lsym )
3 Compute the first k (with smallest eigenvalues) eigenvectors of L,
v1 , · · · , vk
4 Let V ∈ Rn×k be a matrix contains v1 , ·, vk as columns
5 Cluster the row vectors yi with the k-means algorithm into cluster
C1 , · · · , Ck
Output: Clusters A1 , · · · , Ak with Ai = {j|yj ∈ Ci }
59 / 94
74. EX 3. Spectral Clustering (2/3)
Why it works?
Graph Cut Point of View: Construct a partition that minimize the
weight across the cut (the well-known mincut problem) while
balancing the clusters (e.g., RatioCut, Normalized cut).
Random Walks Point of View: When minimizing Ncut, we
actually look for a cut through the graph such that a random walk
seldom transitions from one cluster to another.
Perturbation Theory Point of View: The distance between
eigenvectors from the ideal and nearly ideal graph Laplacian is
bounded by a constant times a norm of the error matrix. If the
perturbations are not small enough, then the k-means algorithm
will still separate the groups from each other.
60 / 94
75. EX 3. Spectral Clustering (2/3)
Why it works?
Graph Cut Point of View: Construct a partition that minimize the
weight across the cut (the well-known mincut problem) while
balancing the clusters (e.g., RatioCut, Normalized cut).
Random Walks Point of View: When minimizing Ncut, we
actually look for a cut through the graph such that a random walk
seldom transitions from one cluster to another.
Perturbation Theory Point of View: The distance between
eigenvectors from the ideal and nearly ideal graph Laplacian is
bounded by a constant times a norm of the error matrix. If the
perturbations are not small enough, then the k-means algorithm
will still separate the groups from each other.
60 / 94
76. EX 3. Spectral Clustering (2/3)
Why it works?
Graph Cut Point of View: Construct a partition that minimize the
weight across the cut (the well-known mincut problem) while
balancing the clusters (e.g., RatioCut, Normalized cut).
Random Walks Point of View: When minimizing Ncut, we
actually look for a cut through the graph such that a random walk
seldom transitions from one cluster to another.
Perturbation Theory Point of View: The distance between
eigenvectors from the ideal and nearly ideal graph Laplacian is
bounded by a constant times a norm of the error matrix. If the
perturbations are not small enough, then the k-means algorithm
will still separate the groups from each other.
60 / 94
77. EX 3. Spectral Clustering (3/3)
[Shi and Malik PAMI 02]
Eigenvectors carry contour information.
61 / 94
78. EX 4. Probabilistic Graphical Model (1/2)
What is probabilistic graphical models?
A marriage between probability theory and graph theory.
A natural tool for dealing with uncertainty and complexity
Provides a way to view all probablistic systems (e.g., mixture
models, factor analysis, hidden Markov models, Kalman filters and
Ising models) as instances of a common underlying formalism.
62 / 94
80. EX 5. Structured Prediction (1/2)
What is structured prediction?
Structured prediction is a framework for solving problems of
classification or regression in which the output variables are
mutually dependent or constrained.
Lots of examples
Natural language parsing
Machine translation
Object segmentation
Gene prediction
Protein alignment
Numerous tasks in computational linguistics, speech, vision,
biology.
64 / 94
81. EX 5. Structured Prediction (1/2)
What is structured prediction?
Structured prediction is a framework for solving problems of
classification or regression in which the output variables are
mutually dependent or constrained.
Lots of examples
Natural language parsing
Machine translation
Object segmentation
Gene prediction
Protein alignment
Numerous tasks in computational linguistics, speech, vision,
biology.
64 / 94
82. EX 5. Structured Prediction (2/2)
Applications [Lampert et al. ECCV 08] [Desai et al. ICCV 09]
65 / 94
83. EX 6. Bilateral Filtering (1/3)
What’s Bilateral Filtering?
A technique to smooth images while preserving edges
Ubiquitous in image processing, computational photography
66 / 94
84. EX 6. Bilateral Filtering (2/3)
[Bennett and McMillan SIGGRAPH 05] [Eisemann and Durand SIGGRAPH 04] [Jones
et al. SIGGRAPH 03] [Winnem¨oller et al. SIGGRAPH 06] [Bae et al. SIGGRAPH 02]
67 / 94
85. EX 6. Bilateral Filtering (3/3)
How does bilateral filter relate with other methods?
Intepretation
Bilateral filter is equivalent to mode filtering in local histograms
Bilateral filter can be interpreted in term of robust statistics since it
is related to a cost function
Bilateral filter is a discretization of a particular kind of a
PDE-based anisotropic diffusion
68 / 94
86. EX 6. Bilateral Filtering (3/3)
How does bilateral filter relate with other methods?
Intepretation
Bilateral filter is equivalent to mode filtering in local histograms
Bilateral filter can be interpreted in term of robust statistics since it
is related to a cost function
Bilateral filter is a discretization of a particular kind of a
PDE-based anisotropic diffusion
68 / 94
87. EX 7. Sparse Representation (1/4)
Ideas
Natural signals (e.g. audio, image) usually admit sparse
representation (i.e., can be well represented by a linear
combination of a few atom signals)
Successfully applied to various areas in signal/image precessing,
vision and graphics.
69 / 94
88. EX 7. Sparse Representation (2/4)
Image Restoration [Aharon et al. TSP 06] [Julien et al. TIP 08]
denoising Inpainting
Demoisaic Inpainting
70 / 94
89. EX 7. Sparse Representation (3/4)
Classification [Wright et al. PAMI 09] [Julien et al. CVPR ECCV NIPS 08]
face recognition edge detection
texture classification pixel classification
71 / 94
90. EX 7. Sparse Representation (4/4)
Compressive sensing [donoho TIT 06] [Candes and Tao TIT 05 06]
and more (e.g., low-rank matrix completion, robust PCA)
72 / 94
91. Outline
1 Introduction
2 Five ways to come up with new ideas
Seek different dimensions neXt = X d
Combine two or more topics neXt = X + Y
Re-think the research directions ¯
neXt = X
Use powerful tools, find suitable problems neXt = X ↑
Add an appropriate adjective neXt = Adj + X
3 What is a bad idea?
73 / 94
92. Add an appropriate adjective neXt = Adj + X
There is only one religion, though
there are a hundred versions of it. -
George Bernard Shaw
74 / 94
93. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
94. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
95. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
96. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
97. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
98. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
99. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
100. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
101. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
102. Add an appropriate adjective neXt = Adj + X
What kinds of adjective can we use?
linear ⇔ non-linear
generative/reconstructive ⇔ discriminative
rule-based / hand-designed ⇔ leanring-based
single scale ⇔ multi-scale
signle step ⇔ progressive
batch processing ⇔ incremental / online processing
fixed ⇔ adaptive / dynamic to data
parametric ⇔ non-parametric
Z - invariant (Z = translation / scale / rotation / noise, facial
expression / pose / lighting / occlusion)
Z - aware (Z = motion / content / semantic / context / occlusion)
75 / 94
103. EX 1. Linear ⇔ Non-linear
Hard to find a straingt line to seperate them into two cluster?
Ideas
Linear methods may not capture the nonlinear structure in the
original data representation
Nonlinear methods
Kernel tricks (e.g., Kernel PCA, Kernel LDA, Kernel SVM, etc)
Manifold learning (e.g., ISOMAP, LLE, Laplacian eigenmap, etc)
76 / 94
104. EX 1. Linear ⇔ Non-linear
Hard to find a straingt line to seperate them into two cluster?
Ideas
Linear methods may not capture the nonlinear structure in the
original data representation
Nonlinear methods
Kernel tricks (e.g., Kernel PCA, Kernel LDA, Kernel SVM, etc)
Manifold learning (e.g., ISOMAP, LLE, Laplacian eigenmap, etc)
76 / 94
105. EX 2. Generative ⇔ Discriminative
Classification task : X → Y
Generative classifier estimate class-conditional pdfs P(X|Y) and
prior probabilities P(Y)
Naive Bayes, Mixtures of Gaussians, Mixtures of experts, Hidden
Markov Models (HMM), Sigmoidal belief networks, Bayesian
networks, Markov random fields (MRF)
Discriminative classifier estimate posterior probabilities P(Y|X)
Logistic regression, SVMs, Traditional neural networks, Nearest
neighbor, Conditional Random Fields (CRF)
Bayes’ rule
P(X|Y)P(Y)
P(Y|X) =
P(X)
Two different perspectives in viewing a problem
77 / 94
106. EX 2. Generative ⇔ Discriminative
Classification task : X → Y
Generative classifier estimate class-conditional pdfs P(X|Y) and
prior probabilities P(Y)
Naive Bayes, Mixtures of Gaussians, Mixtures of experts, Hidden
Markov Models (HMM), Sigmoidal belief networks, Bayesian
networks, Markov random fields (MRF)
Discriminative classifier estimate posterior probabilities P(Y|X)
Logistic regression, SVMs, Traditional neural networks, Nearest
neighbor, Conditional Random Fields (CRF)
Bayes’ rule
P(X|Y)P(Y)
P(Y|X) =
P(X)
Two different perspectives in viewing a problem
77 / 94
107. EX 2. Generative ⇔ Discriminative
Classification task : X → Y
Generative classifier estimate class-conditional pdfs P(X|Y) and
prior probabilities P(Y)
Naive Bayes, Mixtures of Gaussians, Mixtures of experts, Hidden
Markov Models (HMM), Sigmoidal belief networks, Bayesian
networks, Markov random fields (MRF)
Discriminative classifier estimate posterior probabilities P(Y|X)
Logistic regression, SVMs, Traditional neural networks, Nearest
neighbor, Conditional Random Fields (CRF)
Bayes’ rule
P(X|Y)P(Y)
P(Y|X) =
P(X)
Two different perspectives in viewing a problem
77 / 94
108. EX 3. Rule-based / Hand-designed ⇔ Leanring-based
Hard to find rules to recognize digits?
Ideas
It may be difficult to design a set of rule to do certain task such as
handwritten digit recognition
Turn to machine learning methods instead
78 / 94
109. EX 4. Single scale ⇔ Multi-scale
[Zelnik-Manor and Perona NIPS 04]
Ideas
We live in a multi-scale world (atom ↔ universe)
Image pyraimds / scale-space theory / wavelet representation →
all attempt to capture the multi-scale properties in signal/images.
79 / 94
110. EX 5. Single step ⇔ Progressive
[Yuan et al. SIGGRAPH 08]
Ideas
Some problems are difficult to solve in one step → solve it
progressively
80 / 94
111. EX 6. Batch processing ⇔ Incremental / Online
processing
Ideas
Online methods can handle potentially infinite data samples and
time-varied data
Examples
PCA → Incremental PCA (many variants)
LDA → Incremental LDA (many variants)
SVM → Incremental and decremental SVM [Cauwenberghs and
Poggio NIPS 01]
Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →
Online dictionary learning [Mairal et al. ICML/JMLR 09]
AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]
Multiple instance boosting → Online multiple instance boosting
[Babenko et al. CVPR 09]
81 / 94
112. EX 6. Batch processing ⇔ Incremental / Online
processing
Ideas
Online methods can handle potentially infinite data samples and
time-varied data
Examples
PCA → Incremental PCA (many variants)
LDA → Incremental LDA (many variants)
SVM → Incremental and decremental SVM [Cauwenberghs and
Poggio NIPS 01]
Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →
Online dictionary learning [Mairal et al. ICML/JMLR 09]
AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]
Multiple instance boosting → Online multiple instance boosting
[Babenko et al. CVPR 09]
81 / 94
113. EX 6. Batch processing ⇔ Incremental / Online
processing
Ideas
Online methods can handle potentially infinite data samples and
time-varied data
Examples
PCA → Incremental PCA (many variants)
LDA → Incremental LDA (many variants)
SVM → Incremental and decremental SVM [Cauwenberghs and
Poggio NIPS 01]
Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →
Online dictionary learning [Mairal et al. ICML/JMLR 09]
AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]
Multiple instance boosting → Online multiple instance boosting
[Babenko et al. CVPR 09]
81 / 94
114. EX 6. Batch processing ⇔ Incremental / Online
processing
Ideas
Online methods can handle potentially infinite data samples and
time-varied data
Examples
PCA → Incremental PCA (many variants)
LDA → Incremental LDA (many variants)
SVM → Incremental and decremental SVM [Cauwenberghs and
Poggio NIPS 01]
Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →
Online dictionary learning [Mairal et al. ICML/JMLR 09]
AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]
Multiple instance boosting → Online multiple instance boosting
[Babenko et al. CVPR 09]
81 / 94
115. EX 6. Batch processing ⇔ Incremental / Online
processing
Ideas
Online methods can handle potentially infinite data samples and
time-varied data
Examples
PCA → Incremental PCA (many variants)
LDA → Incremental LDA (many variants)
SVM → Incremental and decremental SVM [Cauwenberghs and
Poggio NIPS 01]
Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →
Online dictionary learning [Mairal et al. ICML/JMLR 09]
AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]
Multiple instance boosting → Online multiple instance boosting
[Babenko et al. CVPR 09]
81 / 94
116. EX 6. Batch processing ⇔ Incremental / Online
processing
Ideas
Online methods can handle potentially infinite data samples and
time-varied data
Examples
PCA → Incremental PCA (many variants)
LDA → Incremental LDA (many variants)
SVM → Incremental and decremental SVM [Cauwenberghs and
Poggio NIPS 01]
Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →
Online dictionary learning [Mairal et al. ICML/JMLR 09]
AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]
Multiple instance boosting → Online multiple instance boosting
[Babenko et al. CVPR 09]
81 / 94
117. EX 7. Fixed ⇔ Adaptive / Dynamic
[Elad and Aharon TIP 06]
Ideas
Adaptive approaches usually outperform the predefined/fixed
ones.
82 / 94
118. EX 8. Parametric ⇔ Non-parametric
Probability density estimation
Parametric
Assumes a specific functional form with paramter θ
e.g., Gaussian distribution with unknown mean and variance, mixture
of Gaussians
Parameter estimation
Estimative approach: p(x) = p(x|θbest )
Bayesian approach p(x) = a(θ)p(x|θ)dθ
Non-parametric
Do not assume a specific form of the probability distributions
e.g., Histogram, kernel density estimation (or Parzen window method)
83 / 94
119. EX 8. Parametric ⇔ Non-parametric
Probability density estimation
Parametric
Assumes a specific functional form with paramter θ
e.g., Gaussian distribution with unknown mean and variance, mixture
of Gaussians
Parameter estimation
Estimative approach: p(x) = p(x|θbest )
Bayesian approach p(x) = a(θ)p(x|θ)dθ
Non-parametric
Do not assume a specific form of the probability distributions
e.g., Histogram, kernel density estimation (or Parzen window method)
83 / 94
120. EX 9. Z - invariant
Make your method robust to potential performance degradation
noise (e.g., Gaussian additive noise, impluse noise, non-uniform
noise) (e.g., image restoration)
translation shift (e.g., near-duplicate image/video detection, image
search)
scale change (e.g., object detection, feature extraction)
perspective distortion (e.g., feature extraction)
deformation (e.g., non-rigid registration, part-based object
detection)
pose variation (e.g., human pose estimation)
lighting variation (e.g., face recognition)
partial occlusion (e.g., object detection and recognition)
84 / 94
121. EX 9. Z - invariant
Make your method robust to potential performance degradation
noise (e.g., Gaussian additive noise, impluse noise, non-uniform
noise) (e.g., image restoration)
translation shift (e.g., near-duplicate image/video detection, image
search)
scale change (e.g., object detection, feature extraction)
perspective distortion (e.g., feature extraction)
deformation (e.g., non-rigid registration, part-based object
detection)
pose variation (e.g., human pose estimation)
lighting variation (e.g., face recognition)
partial occlusion (e.g., object detection and recognition)
84 / 94
122. EX 9. Z - invariant
Make your method robust to potential performance degradation
noise (e.g., Gaussian additive noise, impluse noise, non-uniform
noise) (e.g., image restoration)
translation shift (e.g., near-duplicate image/video detection, image
search)
scale change (e.g., object detection, feature extraction)
perspective distortion (e.g., feature extraction)
deformation (e.g., non-rigid registration, part-based object
detection)
pose variation (e.g., human pose estimation)
lighting variation (e.g., face recognition)
partial occlusion (e.g., object detection and recognition)
84 / 94
123. EX 9. Z - invariant
Make your method robust to potential performance degradation
noise (e.g., Gaussian additive noise, impluse noise, non-uniform
noise) (e.g., image restoration)
translation shift (e.g., near-duplicate image/video detection, image
search)
scale change (e.g., object detection, feature extraction)
perspective distortion (e.g., feature extraction)
deformation (e.g., non-rigid registration, part-based object
detection)
pose variation (e.g., human pose estimation)
lighting variation (e.g., face recognition)
partial occlusion (e.g., object detection and recognition)
84 / 94
124. EX 9. Z - invariant
Make your method robust to potential performance degradation
noise (e.g., Gaussian additive noise, impluse noise, non-uniform
noise) (e.g., image restoration)
translation shift (e.g., near-duplicate image/video detection, image
search)
scale change (e.g., object detection, feature extraction)
perspective distortion (e.g., feature extraction)
deformation (e.g., non-rigid registration, part-based object
detection)
pose variation (e.g., human pose estimation)
lighting variation (e.g., face recognition)
partial occlusion (e.g., object detection and recognition)
84 / 94
125. EX 9. Z - invariant
Make your method robust to potential performance degradation
noise (e.g., Gaussian additive noise, impluse noise, non-uniform
noise) (e.g., image restoration)
translation shift (e.g., near-duplicate image/video detection, image
search)
scale change (e.g., object detection, feature extraction)
perspective distortion (e.g., feature extraction)
deformation (e.g., non-rigid registration, part-based object
detection)
pose variation (e.g., human pose estimation)
lighting variation (e.g., face recognition)
partial occlusion (e.g., object detection and recognition)
84 / 94
126. EX 9. Z - invariant
Make your method robust to potential performance degradation
noise (e.g., Gaussian additive noise, impluse noise, non-uniform
noise) (e.g., image restoration)
translation shift (e.g., near-duplicate image/video detection, image
search)
scale change (e.g., object detection, feature extraction)
perspective distortion (e.g., feature extraction)
deformation (e.g., non-rigid registration, part-based object
detection)
pose variation (e.g., human pose estimation)
lighting variation (e.g., face recognition)
partial occlusion (e.g., object detection and recognition)
84 / 94
127. EX 9. Z - invariant
Make your method robust to potential performance degradation
noise (e.g., Gaussian additive noise, impluse noise, non-uniform
noise) (e.g., image restoration)
translation shift (e.g., near-duplicate image/video detection, image
search)
scale change (e.g., object detection, feature extraction)
perspective distortion (e.g., feature extraction)
deformation (e.g., non-rigid registration, part-based object
detection)
pose variation (e.g., human pose estimation)
lighting variation (e.g., face recognition)
partial occlusion (e.g., object detection and recognition)
84 / 94
128. EX 10. Z - aware
[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]
motion-aware video resizing
Make your method be aware of potential failure cases
Motion (e.g., video processing)
Content (e.g., image processing)
Semantic (e.g., image and video indexing/retrival)
Context (e.g., image understanding)
Occlusion (e.g., detection/tracking)
85 / 94
129. EX 10. Z - aware
[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]
motion-aware video resizing
Make your method be aware of potential failure cases
Motion (e.g., video processing)
Content (e.g., image processing)
Semantic (e.g., image and video indexing/retrival)
Context (e.g., image understanding)
Occlusion (e.g., detection/tracking)
85 / 94
130. EX 10. Z - aware
[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]
motion-aware video resizing
Make your method be aware of potential failure cases
Motion (e.g., video processing)
Content (e.g., image processing)
Semantic (e.g., image and video indexing/retrival)
Context (e.g., image understanding)
Occlusion (e.g., detection/tracking)
85 / 94
131. EX 10. Z - aware
[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]
motion-aware video resizing
Make your method be aware of potential failure cases
Motion (e.g., video processing)
Content (e.g., image processing)
Semantic (e.g., image and video indexing/retrival)
Context (e.g., image understanding)
Occlusion (e.g., detection/tracking)
85 / 94
132. EX 10. Z - aware
[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]
motion-aware video resizing
Make your method be aware of potential failure cases
Motion (e.g., video processing)
Content (e.g., image processing)
Semantic (e.g., image and video indexing/retrival)
Context (e.g., image understanding)
Occlusion (e.g., detection/tracking)
85 / 94
133. Outline
1 Introduction
2 Five ways to come up with new ideas
Seek different dimensions neXt = X d
Combine two or more topics neXt = X + Y
Re-think the research directions ¯
neXt = X
Use powerful tools, find suitable problems neXt = X ↑
Add an appropriate adjective neXt = Adj + X
3 What is a bad idea?
86 / 94
134. What is a bad idea?
Naive combination of two or more methods
Avoid a pipeline system paper
Blind application of tools
Use X feature and Y classifier without motivation and justification
Follow the hype
Too many competitors
Do just because it can be done
Do the right things, not just do things right
87 / 94