How to come up with new research ideas

1. How to Come Up With New Research Ideas? Jia-Bin Huang jbhuang0604@gmail.com Taiwan May , 2010 1 / 94

2. What this talk is about? Five approaches to come up with new ideas in computer vision. Extensive case studies (i.e., more than one hundred papers). A common sense talk. No complicate theories or equations. I wish someone told me this before. Reference The content of this talk is greatly inspired by “Raskar Idea Hexagon". 2 / 94

5. Outline 1 Introduction 2 Five ways to come up with new ideas Seek different dimensions neXt = X d Combine two or more topics neXt = X + Y Re-think the research directions ¯ neXt = X Use powerful tools, ﬁnd suitable problems neXt = X ↑ Add an appropriate adjective neXt = Adj + X 3 What is a bad idea? 3 / 94

7. Active Topics in Computer Vision [Szeliski Computer Vision: Algorithms and Applications 2010] Digital image processing Blocks world, line labeling Generalized cylinders Pictorial structures Stereo correspondence Intrinsic images Optical flow Structure from motion Image pyramids Scale-space processing Shape from X Physically-based modeling Regularization Markov Random Fields Kalman filters 3D range data processing Projective invariants Factorization Physics-based vision Graph cuts Particle filtering Energy-based segmentation Face recognition and detection Subspace methods Image-based modeling/rendering Texture synthesis/inpainting Computational photography Feature-based recognition MRF inference algorithms Learning 5 / 94

8. What can we learn from the past? The topics are diverse and evolve over time. The ways to come up with new ideas are similar. There are patterns to follow. 6 / 94

11. Seek different dimensions neXt = X d The only difference between a rut and a grave is their dimensions. - Ellen Glasgow 9 / 94

12. Seek different dimensions neXt = X d Idea Can we increase/replace/transform the dimensions of the original problem to get new problems/solutions? What kind of dimensions can we work on? 1 Concrete dimensions (e.g., space, time, frequency) 2 Abstract dimensions (e.g., properties) 10 / 94

13. EX 1-1. Content-Aware Media Resizing [Avidan et al. SIGGRAPH 07] [Rubinstein et al. SIGGRAPH 08] Ideas Extend dimensions from 2D image to 3D video: image re-targeting ⇒ video re-targeting Other dimensions? E.g., 4D light ﬁeld, infrared image, range image. 11 / 94

14. EX 1-2. Video Stitching [Rav-Acha et al. CVPR 05] Input video Dynamic Panorama Ideas Extend dimensions from image to video, i.e., Image Panorama ⇒ Video Mosaics with Non-Chronological Time Increase the time dimension in both input and output 12 / 94

15. EX 1-3. Multi-Image Fusion [Agarwala et al. SIGGRAPH 04] Ideas Extend from single input image to multiple input images ⇒ Digital Photomontage Increase the dimension in input only. 13 / 94

16. EX 1-4. Computation Photography (Coded Photography) [Raskar et al. SIGGRAPH 04, 06, 08] [Levin et al. SIGGRAPH 07] Ideas Coded Photography: reversibly encode information about the scene in a single photograph Coding in Time (Exposure), Coded Illumination, Coding in Space (aperture), and Coded Wavelength Replace the dimension to code information of the light ﬁeld 14 / 94

17. EX 1-1. Photography in Low Light Conditions Flash Blurred Noisy What we can do ? Flash → Changes the overall scene appearance (cold and gray) Long exposure time (hand shake) → Blurred image Short exposure time (insufﬁcient light) → Noisy image 15 / 94

18. EX 1-1-1. Flash/non-Flash Photography [Petschnigg et al. SIGGRAPH 2004] Flash No flash Detail transfer with denoising Ideas The original problem (taking a good photo in low light environments from single image) is difficult. Increase the dimension of input (flash/no-flash image pair) make the problem much easier. 16 / 94

19. EX 1-1-2. Image Deblurring with Blurred/Noisy Image Pairs [Yuan et al. SIGGRAPH 2007] Blurred Noisy Enhanced noisy Deblurred result Ideas The original problem (taking a good photo in low light and ﬂash prohibited environments from single image) is difﬁcult. Increase the dimension of input (Blurred/Noisy image pair) make the problem much easier. 17 / 94

20. EX 1-1-3. Robust Flash Deblurring [Zhou et al. CVPR 2010] Ideas The original problem (taking a good photo in low light environments from single image) is difﬁcult. Increase the dimension of input (Blurred/Flash image pair) make the problem much easier. 18 / 94

21. EX 1-1-4. Dark Flash Photography [Krishnan et al. SIGGRAPH 2009] Ideas The original problem (taking a good photo in low light environments from single image) is difﬁcult. Increase the dimension of input (Dark Flash/Noisy image pair) make the problem much easier. 19 / 94

22. EX 1-2. Brute-Force Vision [Hays and Efros SIGGRAPH 07] [Dale et al. ICCV 09] [Agarwal et al. ICCV 09] [Furukawa et al. ICCV 09] Ideas Utilize a large collection of photos. 20 / 94

23. EX 2-1. X Alignment/Registration (pixel, object, scene) [Liu et al. CVPR 08, ECCV 08] [Berg et al. CVPR 05] 21 / 94

24. EX 2-2. Shape from X (shading, texture, specular) [Lobay and Forsyth IJCV 06] [Fleming et al JOV 04] [Adato et al ICCV 07] shading specular texture specular ﬂow 22 / 94

25. EX 2-3. Depth from X (stereo, (de-)focus, coded aperture, diffusion, occlusion, semantic label) [Levin et al. SIGGRAPH 07] [Hoiem et al. ICCV 07] [Liu et al. CVPR 10] [Zhou et al. CVPR 10] Coded Aperture Semantic Labels Occlusion Diffusion 23 / 94

26. EX 2-4. Infer X from a single image (geometric, geography, illumination) [Hoiem et al. ICCV 05] [Hays and Efros CVPR 08] [Lalonde et al. ICCV 09] Geometric Geography Illumination 24 / 94

28. Combine two or more topics neXt = X + Y To steal ideas from one person is plagiarism. To steal from many is research. - Wilson Mizner 26 / 94

29. Combine two or more topics neXt = X + Y Idea Can we combine two or more topics to get new problems or solutions? What kind of topics can we combine? 1 X, Y are methods 2 X, Y are problems 3 X, Y are areas 27 / 94

30. EX 1-1. Viola-Jones Object Detection Framework [Viola and Jones CVPR 2001] Simple feature Integral img Boosting Cascade structure Ideas Paper title: Rapid Object Detection using a Boosted Cascade of Simple Features Viola-Jones object detection framework = Integral Images (simple feature)(1984) + AdaBoost(1997) + Cascade Architecture(long time ago) 28 / 94

31. EX 1-2. SIFT Flow = SIFT + Optical Flow [Liu et al. ECCV 08 CVPR 09] Motion hallucination Label transfer Ideas Dense sampling in time : optical ﬂow :: dense sampling in world images : SIFT ﬂow 29 / 94

32. EX 1-3. Visual Tracking with Online Multiple Instance Boosting [Babenko et al. CVPR 09] Ideas MILTrack = Multiple Instance Boosting (2005) + Online Boosting Tracking (2006) 30 / 94

33. EX 2-1. High Dynamic Range Image Reconstruction from Hand-held Cameras [Lu et al. CVPR 2009] Ideas HDR from from Hand-held Cameras = High Dynamic Range Image Reconstruction + Image Deblurring 31 / 94

34. EX 2-2. Human Body Understanding [Guan et al. ICCV 09] Ideas Human Body Understanding = Shape Reconstruction + Pose Estimation 32 / 94

35. EX 2-3. Image Understanding detection, tracking, recognition, segmentation, reconstruction, scene classiﬁcation, event recognition 33 / 94

36. EX 2-3-1. Detection + Tracking [Andriluka et al. CVPR 08] Ideas People detection and people tracking are highly correlated problems. Combine two problems can potentially achieve improved performance on individual tasks. 34 / 94

37. EX 2-3-2. Object Attribute + Recognition [Farhadi et al. CVPR 09] [Lampert et al. CVPR 09] Ideas Describe image by attributes Enable knowledge transfer to recognition class with no visual examples 35 / 94

38. EX 2-3-2. Object Recognition + Detection [Yeh et al. CVPR 09] Ideas Concurrent object localization and recognition 36 / 94

39. EX 2-3-3. Image Segmentation + Object Recognition + Event Recognition [Li et al. CVPR 09] Ideas Combine scene classiﬁcation, image segmentation, image annotation All three tasks are mutually beneﬁcial 37 / 94

40. EX 3-1. SixthSense - A Wearable Gestural Interface [Mistry and Maes TED 2009] Ideas SixthSense = Computer Vision (e.g., tracking, recognition) + Internet 38 / 94

41. EX 3-2. Sikuli:Picture-driven computing [Yeh et al. UIST 09] [Chang et al. CHI 10] Ideas 1. Readability/usability, 2. GUI serialization, 3. Computer vision on computer-generated ﬁgures 39 / 94

43. Re-think the research directions ¯ neXt = X If at ﬁrst, the idea is not absurd, then there is no hope for it - Albert Einstein 41 / 94

44. Re-think the research directions ¯ neXt = X Ideas Are the current research directions really make sense? What’s the key problem? What could we do? 1 Re-formulate the original problem. 2 Analyze, compare existing approaches. Provide insight to the problems. 42 / 94

45. EX 1-1. Beyond Sliding Windows [Lampert et al. CVPR 08] Rectangle set Branch and bound search Ideas Sliding window search ⇔ brand-and-bound search Represent a set of rectangles with 4 intervals Use brand-and-bound to ﬁnd the optimal rectangle (object localization) efﬁciently 43 / 94

46. EX 1-2. Beyond Categories [Malisiewicz and Efros CVPR 08, NIPS 09] Ideas Explicit categorization ⇔ Implicit categorization Ask "what is this like?" (association), instead of "what is it?" (categorization) 44 / 94

47. EX 1-3. Motion-Invariant Photography [Levin et al. SIGGRAPH 08] [Cho et al. ICCP 10] Ideas Still camera ⇔ Moving camera (parabolic exposures) Enable the use of spatial-invariant blur kernel estimation 45 / 94

48. EX 1-4. Super-resolution from Single Image [Glasner et al. ICCV 09] Ideas Clasical multi-image SR/Example-based SR ⇔ Single SR framework 46 / 94

49. EX 2-1. In Defense of ... [Boiman et al. CVPR 08] [Hartley PAMI 97] Nearest-Neighbor Based Image Classiﬁcation Quantization of local image descriptors (used to generate "bags-of-words", codebooks). Computation of "Image-to-Image" distance, instead of "Image-to-Class" distance The performance ranks among the top leading learning-based image classiﬁers The 8-point Algorithm for the fundamental matrix Normalization, Normalization, Normalization! Performs almost as well as the best iterative algorithm 47 / 94

50. EX 2-2. Understanding blind deconvolution [Levin et al. CVPR 2009] Ideas Blind deconvolution: recover sharp image x from the blurred one (y = k ⊗ x + n). MAPx,k estimation often favors no-blur explanations. MAPk can be accurately estimated since the kernel size is often smaller than the image size. Blind deconvolution should be address in this way: MAPk + non-blind deconvolution. 48 / 94

51. EX 2-3. Understanding camera trade-offs [Levin et al. ECCV 08] Ideas Traditional optics evaluation: 2D image sharpness (eg, Modulation Transfer Function) Modern camera evaluation: How well does the recorded data allow us to estimate the visual world - the lightﬁeld? 49 / 94

52. EX 2-4. What is a good image segment? [Bagon et al. ECCV 08] Ideas Good image segment as one which can be easily composed using its own pieces, but is difﬁcult to compose using pieces from other parts of the image 50 / 94

53. EX 2-5. Lambertian Reﬂectance and Linear Subspaces [Basri and Jacobs PAMI 03] Ideas The set of all Lambertian reﬂectance functions (the mapping from surface normals to intensities) obtained with arbitrary distant light sources lies close to a 9D linear subspace. Explain prior empirical results using linear subspace methods. 51 / 94

55. Use powerful tools, ﬁnd suitable problems neXt = X ↑ If the only tool you have is a hammer, you tend to see every problem as a nail. - Abraham Maslow 53 / 94

56. Use powerful tools, ﬁnd suitable problems neXt = X ↑ What kinds of tools should we understand? Calculus of Variations Dimensionality Reduction Spectral Methods (speciﬁcally, spectral clustering) Probabilistic Graphical Model Structured Prediction Bilateral Filtering Sparse Representation and more ... spectral method/theory, information theory, (convex) optimization, etc 54 / 94

57. EX 1. Calculus of Variations (1/2) From Calculus to Calculus of Variations Calculus Calculus of Variations Functions Functionals (functions of functions) x f: Rn → R f: F → R, f (u) = x12 L(x, u(x), u (x))dx (x) df (u) Derivative dfdx Variation du lim∆x→0 f (x+∆x)−f (x) ∆x lim →0 f (u+ δx)−f (u) ∂ f (x + ∆u)| ∂ =0 Local extremum Local extremum df (x) dx = 0 Euler-Lagrange equation Total Variation (TV) x1 TV(y) = x0 |y |dx: The "oscillation strength" of y(x) 55 / 94

58. EX 1. Calculus of Variations (2/2) Total Variation Denoising/Inpainting Applications in computer vision Optical ﬂow [Horn and Schunck AI 81] Shape from shading [Horn and Brooks CVGIP 86] Edge detection [PAMI 87] Anisotropic diffusion [Perona and Malik PAMI 90] Active contours model [Kass et al. IJCV 98] Image segmentation [Morel and Solimini 95] Image restoration [Aubert and Vese SIAM Journal on NA 97] 56 / 94

59. EX 1. Calculus of Variations (2/2) Total Variation Denoising/Inpainting Applications in computer vision Optical ﬂow [Horn and Schunck AI 81] Shape from shading [Horn and Brooks CVGIP 86] Edge detection [PAMI 87] Anisotropic diffusion [Perona and Malik PAMI 90] Active contours model [Kass et al. IJCV 98] Image segmentation [Morel and Solimini 95] Image restoration [Aubert and Vese SIAM Journal on NA 97] 56 / 94

60. EX 2. Dimensionality Reduction (1/2) Why we need dimensionality reduction? Since high-dimensional data is everywhere (e.g., images, human gene distributions, weather prediction), we need dimensionality reduction for 1 processing data efficiently. 2 estimating the distributions of data accuratly (curse of dimensionality) 3 finding meaningful representation of data Classification of dimensionality reduction methods Global structure preserved Local structure preserved Linear PCA, LDA LPP, NPE Nonlinear ISOMAP, Kernel PCA, DM LLE, LE, HE 57 / 94

61. EX 2. Dimensionality Reduction (1/2) Why we need dimensionality reduction? Since high-dimensional data is everywhere (e.g., images, human gene distributions, weather prediction), we need dimensionality reduction for 1 processing data efficiently. 2 estimating the distributions of data accuratly (curse of dimensionality) 3 finding meaningful representation of data Classification of dimensionality reduction methods Global structure preserved Local structure preserved Linear PCA, LDA LPP, NPE Nonlinear ISOMAP, Kernel PCA, DM LLE, LE, HE 57 / 94

62. EX 2. Dimensionality Reduction (2/2) Applications in computer vision Subspace as constraints Structure from motion [Tomasi and Kanade IJCV 92], Optical ﬂow [Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face alignment [Saragih et al. ICCV 09] Face recognition (e.g., PCA, LDA, LPP) PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97], LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09] Motion segmentation subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV 06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09] Lighting linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02] Visual tracking incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR 08] 58 / 94

67. EX 3. Spectral Clustering (1/3) Why spectral clustering is popular? Can be solved efﬁciently by standard linear algebra software Very often outperform traditional clustering algorithms Spectral clustering algorithm Input: a set of data points 1 Construct a similarity graph, e.g., -neighbor, k-nearest neighbor, fully connected 2 Construct graph Laplacian, e.g., (un)normalized (L, Lrw , Lsym ) 3 Compute the ﬁrst k (with smallest eigenvalues) eigenvectors of L, v1 , · · · , vk 4 Let V ∈ Rn×k be a matrix contains v1 , ·, vk as columns 5 Cluster the row vectors yi with the k-means algorithm into cluster C1 , · · · , Ck Output: Clusters A1 , · · · , Ak with Ai = {j|yj ∈ Ci } 59 / 94

74. EX 3. Spectral Clustering (2/3) Why it works? Graph Cut Point of View: Construct a partition that minimize the weight across the cut (the well-known mincut problem) while balancing the clusters (e.g., RatioCut, Normalized cut). Random Walks Point of View: When minimizing Ncut, we actually look for a cut through the graph such that a random walk seldom transitions from one cluster to another. Perturbation Theory Point of View: The distance between eigenvectors from the ideal and nearly ideal graph Laplacian is bounded by a constant times a norm of the error matrix. If the perturbations are not small enough, then the k-means algorithm will still separate the groups from each other. 60 / 94

77. EX 3. Spectral Clustering (3/3) [Shi and Malik PAMI 02] Eigenvectors carry contour information. 61 / 94

78. EX 4. Probabilistic Graphical Model (1/2) What is probabilistic graphical models? A marriage between probability theory and graph theory. A natural tool for dealing with uncertainty and complexity Provides a way to view all probablistic systems (e.g., mixture models, factor analysis, hidden Markov models, Kalman ﬁlters and Ising models) as instances of a common underlying formalism. 62 / 94

79. EX 4. Probabilistic Graphical Model (2/2) 63 / 94

80. EX 5. Structured Prediction (1/2) What is structured prediction? Structured prediction is a framework for solving problems of classiﬁcation or regression in which the output variables are mutually dependent or constrained. Lots of examples Natural language parsing Machine translation Object segmentation Gene prediction Protein alignment Numerous tasks in computational linguistics, speech, vision, biology. 64 / 94

81. EX 5. Structured Prediction (1/2) What is structured prediction? Structured prediction is a framework for solving problems of classiﬁcation or regression in which the output variables are mutually dependent or constrained. Lots of examples Natural language parsing Machine translation Object segmentation Gene prediction Protein alignment Numerous tasks in computational linguistics, speech, vision, biology. 64 / 94

82. EX 5. Structured Prediction (2/2) Applications [Lampert et al. ECCV 08] [Desai et al. ICCV 09] 65 / 94

83. EX 6. Bilateral Filtering (1/3) What’s Bilateral Filtering? A technique to smooth images while preserving edges Ubiquitous in image processing, computational photography 66 / 94

84. EX 6. Bilateral Filtering (2/3) [Bennett and McMillan SIGGRAPH 05] [Eisemann and Durand SIGGRAPH 04] [Jones et al. SIGGRAPH 03] [WinnemÂ¨oller et al. SIGGRAPH 06] [Bae et al. SIGGRAPH 02] 67 / 94

85. EX 6. Bilateral Filtering (3/3) How does bilateral filter relate with other methods? Intepretation Bilateral filter is equivalent to mode filtering in local histograms Bilateral filter can be interpreted in term of robust statistics since it is related to a cost function Bilateral filter is a discretization of a particular kind of a PDE-based anisotropic diffusion 68 / 94

86. EX 6. Bilateral Filtering (3/3) How does bilateral filter relate with other methods? Intepretation Bilateral filter is equivalent to mode filtering in local histograms Bilateral filter can be interpreted in term of robust statistics since it is related to a cost function Bilateral filter is a discretization of a particular kind of a PDE-based anisotropic diffusion 68 / 94

87. EX 7. Sparse Representation (1/4) Ideas Natural signals (e.g. audio, image) usually admit sparse representation (i.e., can be well represented by a linear combination of a few atom signals) Successfully applied to various areas in signal/image precessing, vision and graphics. 69 / 94

88. EX 7. Sparse Representation (2/4) Image Restoration [Aharon et al. TSP 06] [Julien et al. TIP 08] denoising Inpainting Demoisaic Inpainting 70 / 94

89. EX 7. Sparse Representation (3/4) Classification [Wright et al. PAMI 09] [Julien et al. CVPR ECCV NIPS 08] face recognition edge detection texture classification pixel classification 71 / 94

90. EX 7. Sparse Representation (4/4) Compressive sensing [donoho TIT 06] [Candes and Tao TIT 05 06] and more (e.g., low-rank matrix completion, robust PCA) 72 / 94

92. Add an appropriate adjective neXt = Adj + X There is only one religion, though there are a hundred versions of it. - George Bernard Shaw 74 / 94

93. Add an appropriate adjective neXt = Adj + X What kinds of adjective can we use? linear ⇔ non-linear generative/reconstructive ⇔ discriminative rule-based / hand-designed ⇔ leanring-based single scale ⇔ multi-scale signle step ⇔ progressive batch processing ⇔ incremental / online processing ﬁxed ⇔ adaptive / dynamic to data parametric ⇔ non-parametric Z - invariant (Z = translation / scale / rotation / noise, facial expression / pose / lighting / occlusion) Z - aware (Z = motion / content / semantic / context / occlusion) 75 / 94

103. EX 1. Linear ⇔ Non-linear Hard to ﬁnd a straingt line to seperate them into two cluster? Ideas Linear methods may not capture the nonlinear structure in the original data representation Nonlinear methods Kernel tricks (e.g., Kernel PCA, Kernel LDA, Kernel SVM, etc) Manifold learning (e.g., ISOMAP, LLE, Laplacian eigenmap, etc) 76 / 94

104. EX 1. Linear ⇔ Non-linear Hard to ﬁnd a straingt line to seperate them into two cluster? Ideas Linear methods may not capture the nonlinear structure in the original data representation Nonlinear methods Kernel tricks (e.g., Kernel PCA, Kernel LDA, Kernel SVM, etc) Manifold learning (e.g., ISOMAP, LLE, Laplacian eigenmap, etc) 76 / 94

105. EX 2. Generative ⇔ Discriminative Classification task : X → Y Generative classifier estimate class-conditional pdfs P(X|Y) and prior probabilities P(Y) Naive Bayes, Mixtures of Gaussians, Mixtures of experts, Hidden Markov Models (HMM), Sigmoidal belief networks, Bayesian networks, Markov random fields (MRF) Discriminative classifier estimate posterior probabilities P(Y|X) Logistic regression, SVMs, Traditional neural networks, Nearest neighbor, Conditional Random Fields (CRF) Bayes’ rule P(X|Y)P(Y) P(Y|X) = P(X) Two different perspectives in viewing a problem 77 / 94

108. EX 3. Rule-based / Hand-designed ⇔ Leanring-based Hard to ﬁnd rules to recognize digits? Ideas It may be difﬁcult to design a set of rule to do certain task such as handwritten digit recognition Turn to machine learning methods instead 78 / 94

109. EX 4. Single scale ⇔ Multi-scale [Zelnik-Manor and Perona NIPS 04] Ideas We live in a multi-scale world (atom ↔ universe) Image pyraimds / scale-space theory / wavelet representation → all attempt to capture the multi-scale properties in signal/images. 79 / 94

110. EX 5. Single step ⇔ Progressive [Yuan et al. SIGGRAPH 08] Ideas Some problems are difﬁcult to solve in one step → solve it progressively 80 / 94

111. EX 6. Batch processing ⇔ Incremental / Online processing Ideas Online methods can handle potentially inﬁnite data samples and time-varied data Examples PCA → Incremental PCA (many variants) LDA → Incremental LDA (many variants) SVM → Incremental and decremental SVM [Cauwenberghs and Poggio NIPS 01] Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] → Online dictionary learning [Mairal et al. ICML/JMLR 09] AdaBoosting → Online boosting [Grabner and Bischof CVPR 06] Multiple instance boosting → Online multiple instance boosting [Babenko et al. CVPR 09] 81 / 94

117. EX 7. Fixed ⇔ Adaptive / Dynamic [Elad and Aharon TIP 06] Ideas Adaptive approaches usually outperform the predeﬁned/ﬁxed ones. 82 / 94

118. EX 8. Parametric ⇔ Non-parametric Probability density estimation Parametric Assumes a speciﬁc functional form with paramter θ e.g., Gaussian distribution with unknown mean and variance, mixture of Gaussians Parameter estimation Estimative approach: p(x) = p(x|θbest ) Bayesian approach p(x) = a(θ)p(x|θ)dθ Non-parametric Do not assume a speciﬁc form of the probability distributions e.g., Histogram, kernel density estimation (or Parzen window method) 83 / 94

119. EX 8. Parametric ⇔ Non-parametric Probability density estimation Parametric Assumes a speciﬁc functional form with paramter θ e.g., Gaussian distribution with unknown mean and variance, mixture of Gaussians Parameter estimation Estimative approach: p(x) = p(x|θbest ) Bayesian approach p(x) = a(θ)p(x|θ)dθ Non-parametric Do not assume a speciﬁc form of the probability distributions e.g., Histogram, kernel density estimation (or Parzen window method) 83 / 94

120. EX 9. Z - invariant Make your method robust to potential performance degradation noise (e.g., Gaussian additive noise, impluse noise, non-uniform noise) (e.g., image restoration) translation shift (e.g., near-duplicate image/video detection, image search) scale change (e.g., object detection, feature extraction) perspective distortion (e.g., feature extraction) deformation (e.g., non-rigid registration, part-based object detection) pose variation (e.g., human pose estimation) lighting variation (e.g., face recognition) partial occlusion (e.g., object detection and recognition) 84 / 94

128. EX 10. Z - aware [Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10] motion-aware video resizing Make your method be aware of potential failure cases Motion (e.g., video processing) Content (e.g., image processing) Semantic (e.g., image and video indexing/retrival) Context (e.g., image understanding) Occlusion (e.g., detection/tracking) 85 / 94

134. What is a bad idea? Naive combination of two or more methods Avoid a pipeline system paper Blind application of tools Use X feature and Y classiﬁer without motivation and justiﬁcation Follow the hype Too many competitors Do just because it can be done Do the right things, not just do things right 87 / 94

135. 88 / 94

136. 89 / 94

137. 90 / 94

138. 91 / 94

139. 92 / 94

140. 93 / 94

141. Thank you for your kind attention. Questions? For more complete materials, please visit my blog http://jbhuang0604.blogspot.com/ 94 / 94

How to come up with new research ideas

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to How to come up with new research ideas

Similar to How to come up with new research ideas (20)

More from Jia-Bin Huang

More from Jia-Bin Huang (15)

Recently uploaded

Recently uploaded (20)

How to come up with new research ideas