Sparse representation has found applications in numerous domains and recent developments have been focused on the convex relaxation of the `0-norm minimization for sparse coding (i.e., the `1-norm minimization). Nevertheless, the time and space complexities of these algorithms remain significantly high for large-scale problems. As signals in most problems can be modeled by a small set of prototypes, we propose an algorithm that exploits this property and show that the `1-norm minimization problem can be reduced to a much smaller problem, thereby gaining significant speed-ups with much less memory requirements. Experimental results demonstrate that our algorithm is able to achieve double-digit gain in speed with much less memory requirement than the state-of-the-art algorithms.
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
The document summarizes a research paper about learning moving cast shadows for foreground detection. It presents a proposed algorithm that uses a confidence-rated Gaussian mixture learning approach and Bayesian framework with Markov random fields to model local and global shadow features. This exploits the complementary nature of local and global features to improve shadow detection. The algorithm is evaluated on outdoor and indoor video sequences, showing improved accuracy over previous methods especially in adaptability to different lighting conditions. Future work could incorporate additional features and more powerful models.
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Jia-Bin Huang
This document proposes a learning-based approach to improve the accuracy and robustness of cross-ratio based gaze estimation. It introduces an adaptive homography mapping method that uses both head pose variables and pupil center position as predictor variables in a quadratic regression model. This approach is trained on large amounts of simulated eye tracking data to minimize errors across different head poses and eye parameters. Experimental results show the method achieves state-of-the-art accuracy for both stationary gaze and head movements, and is robust to variations in eye features, sensor resolution, and noise.
In this paper, we describe a new interactive image completion system that allows users to easily specify various forms of mid-level structures in the image. Our system supports the specification of four basic symmetric types: reflection, translation, rotation, and glide. The user inputs are automatically converted into guidance maps that encode
possible candidate shifts and, indirectly, local transformations of rotation and scale. These guidance maps are used in conjunction with a color matching cost for image
completion. We show that our system is capable of handling a variety of challenging examples.
http://www.jiabinhuang.com/
This document summarizes reasons to join the Formosa Volleyball Enthusiasts at UIUC. It notes that the group has experienced success in volleyball tournaments and includes players of all skill levels. It also highlights that the group is active, organizing various social and recreational activities beyond just volleyball. Finally, it provides logistical information about typical volleyball meetup times and locations, as well as contact details to learn more.
Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012Jia-Bin Huang
A number of bottom-up saliency detection algorithms have been proposed in the literature. Since these have been developed from intuition and principles inspired by psychophysical studies of human vision, the theoretical relations among them are unclear. In this paper, we present a unifying perspective. Saliency of an image area is defined in terms of divergence between certain feature distributions estimated from the
central part and its surround. We show that various, seemingly different saliency estimation algorithms are in fact closely related. We also discuss some commonly
used center-surround selection strategies. Experiments with two datasets are presented to quantify the relative advantages of these algorithms.
Best student paper award in Computer Vision and Robotics Track
This document outlines a process for simulating miniature photography effects through computational techniques. It discusses using depth of field specification and blurring to create a miniature effect from a standard photo. The author presents several methods for specifying depth of field, including using horizontal focus lines, object masks, and salient region detection. Open issues discussed are depth estimation from a single image and improving salient region selection to better automate the miniature photography simulation process.
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Jia-Bin Huang
In this paper, we propose a fast re-coloring algorithm to improve the accessibility for the color vision impaired. Compared to people with normal color vision, people with color vision impairment have difficulty in distinguishing between certain combinations of colors. This may hinder visual communication owing to the increasing use of colors in recent years. To address this problem, we re-map the hue components in the HSV color space based on the statistics of local characteristics of the original color image. We enhance the color contrast through generalized histogram equalization. A control parameter is provided for various users to specify the degree of enhancement to meet their needs. Experimental results are illustrated to demonstrate the effectiveness and efficiency of the proposed re-coloring algorithm.
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
The document summarizes a research paper about learning moving cast shadows for foreground detection. It presents a proposed algorithm that uses a confidence-rated Gaussian mixture learning approach and Bayesian framework with Markov random fields to model local and global shadow features. This exploits the complementary nature of local and global features to improve shadow detection. The algorithm is evaluated on outdoor and indoor video sequences, showing improved accuracy over previous methods especially in adaptability to different lighting conditions. Future work could incorporate additional features and more powerful models.
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Jia-Bin Huang
This document proposes a learning-based approach to improve the accuracy and robustness of cross-ratio based gaze estimation. It introduces an adaptive homography mapping method that uses both head pose variables and pupil center position as predictor variables in a quadratic regression model. This approach is trained on large amounts of simulated eye tracking data to minimize errors across different head poses and eye parameters. Experimental results show the method achieves state-of-the-art accuracy for both stationary gaze and head movements, and is robust to variations in eye features, sensor resolution, and noise.
In this paper, we describe a new interactive image completion system that allows users to easily specify various forms of mid-level structures in the image. Our system supports the specification of four basic symmetric types: reflection, translation, rotation, and glide. The user inputs are automatically converted into guidance maps that encode
possible candidate shifts and, indirectly, local transformations of rotation and scale. These guidance maps are used in conjunction with a color matching cost for image
completion. We show that our system is capable of handling a variety of challenging examples.
http://www.jiabinhuang.com/
This document summarizes reasons to join the Formosa Volleyball Enthusiasts at UIUC. It notes that the group has experienced success in volleyball tournaments and includes players of all skill levels. It also highlights that the group is active, organizing various social and recreational activities beyond just volleyball. Finally, it provides logistical information about typical volleyball meetup times and locations, as well as contact details to learn more.
Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012Jia-Bin Huang
A number of bottom-up saliency detection algorithms have been proposed in the literature. Since these have been developed from intuition and principles inspired by psychophysical studies of human vision, the theoretical relations among them are unclear. In this paper, we present a unifying perspective. Saliency of an image area is defined in terms of divergence between certain feature distributions estimated from the
central part and its surround. We show that various, seemingly different saliency estimation algorithms are in fact closely related. We also discuss some commonly
used center-surround selection strategies. Experiments with two datasets are presented to quantify the relative advantages of these algorithms.
Best student paper award in Computer Vision and Robotics Track
This document outlines a process for simulating miniature photography effects through computational techniques. It discusses using depth of field specification and blurring to create a miniature effect from a standard photo. The author presents several methods for specifying depth of field, including using horizontal focus lines, object masks, and salient region detection. Open issues discussed are depth estimation from a single image and improving salient region selection to better automate the miniature photography simulation process.
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Jia-Bin Huang
In this paper, we propose a fast re-coloring algorithm to improve the accessibility for the color vision impaired. Compared to people with normal color vision, people with color vision impairment have difficulty in distinguishing between certain combinations of colors. This may hinder visual communication owing to the increasing use of colors in recent years. To address this problem, we re-map the hue components in the HSV color space based on the statistics of local characteristics of the original color image. We enhance the color contrast through generalized histogram equalization. A control parameter is provided for various users to specify the degree of enhancement to meet their needs. Experimental results are illustrated to demonstrate the effectiveness and efficiency of the proposed re-coloring algorithm.
This document provides tips and strategies for effectively reading academic papers. It discusses deciding what papers to read based on relevance and credibility. It recommends making best use of academic resources like preprint sites, blogs, and mailing lists to stay updated. It explains the importance of reading for breadth to understand the big picture and reading for depth to critically examine assumptions, methods, statistics and conclusions. The document concludes by discussing how to take notes and think creatively after reading papers to develop new research ideas.
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Jia-Bin Huang
We propose a method for automatically guiding patch-based image completion using mid-level structural cues. Our method first estimates planar projection parameters, softly segments the known region into planes, and discovers translational regularity within these planes. This information is then converted into soft constraints for the low-level completion algorithm by defining prior probabilities for patch offsets and transformations. Our method handles multiple planes, and in the absence of any detected planes falls back to a baseline fronto-parallel image completion algorithm. We validate our technique through extensive comparisons with state-of-the-art algorithms on a variety of scenes.
Project page: https://sites.google.com/site/jbhuang0604/publications/struct_completion
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Jia-Bin Huang
Self-similarity based super-resolution (SR) algorithms are able to produce visually pleasing results without extensive training on external databases. Such algorithms exploit the statistical prior that patches in a natural image tend to recur within and across scales of the same image. However, the internal dictionary obtained from the given image may not always be sufficiently expressive to cover the textural appearance variations in the scene. In this paper, we extend self-similarity based SR to overcome this drawback. We expand the internal patch search space by allowing geometric variations. We do so by explicitly localizing planes in the scene and using the detected perspective geometry to guide the patch search process. We also incorporate additional affine transformations to accommodate local shape variations. We propose a compositional model to simultaneously handle both types of transformations. We extensively evaluate the performance in both urban and natural scenes. Even without using any external training databases, we achieve significantly superior results on urban scenes, while maintaining comparable performance on natural scenes as other state-of-the-art SR algorithms.
http://bit.ly/selfexemplarsr
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l1-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.
This document provides an overview of image features and categorization in computer vision. It discusses why categorization is important for making predictions about objects and communicating categories. It describes approaches to categorization like definitional, prototype, and exemplar models. Common image features for categorization like color, texture, gradients, and interest points are presented. Methods for representing images as histograms of these features and encoding local descriptors as "bags of visual words" are covered. Deep convolutional neural networks and region-based representations are also summarized. The document aims to explain current techniques for image and region categorization using supervised learning of classifiers on labeled examples and extracted image features.
This document provides an overview of a computer vision crash course. It begins with an agenda for the course that includes introductions, fundamentals of computer vision, and recent advances. It then discusses some of the challenges of computer vision and provides examples of computer vision applications such as face detection, recognition, tracking, hand tracking, biometrics, optical character recognition, computer vision in sports, scene reconstruction, and more. It also provides a brief history of the field and discusses some of the fundamentals including light, matching, alignment, geometry, grouping, and recognition.
- The document provides an introduction to linear algebra and MATLAB. It discusses various linear algebra concepts like vectors, matrices, tensors, and operations on them.
- It then covers key MATLAB topics - basic data types, vector and matrix operations, control flow, plotting, and writing efficient code.
- The document emphasizes how linear algebra and MATLAB are closely related and commonly used together in applications like image and signal processing.
General principles and tricks for writing fast MATLAB code.
Powerpoint slides: https://uofi.box.com/shared/static/yg4ry6s1c9qamsvk6sk7cdbzbmn2z7b8.pptx
Research 101 - Paper Writing with LaTeXJia-Bin Huang
Paper Writing with LaTeX
PDF: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pdf
PPTX: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pptx
This document provides guidance on how to write a clear scientific paper. It discusses the key sections of a paper including the title, abstract, introduction, related work, method, results, and conclusions. The introduction should motivate the problem, prior approaches, contributions, and provide a teaser figure. The related work section should group existing work into topics and compare approaches. The method section should describe the approach with subsections and forward references. The results section covers experiments, metrics, datasets, and includes visual and quantitative results with an ablation study. Figures and tables should be able to stand alone in a presentation. Writing should be concise, consistent, specific and direct with careful use of words, equations, and notation. Overall, the
Real-time Face Detection and RecognitionJia-Bin Huang
Zelun Luo and Anarghya Mitra created a robust face identification system under professors Jia-Bin Huang and Narendra Ahuja at the University of Illinois. The system uses an integral image and cascade architecture with Haar-like features to identify faces. It can identify multiple faces in an image and faces not in its original database by using a learning algorithm. The integral image allows features to be computed rapidly in constant time, while the cascade structure rejects most non-face sub-windows early.
Here is my updated CV using the ModernCV template (http://www.latextemplates.com/template/moderncv-cv-and-cover-letter).
You can find the Tex source file in (https://dl.dropbox.com/u/2810224/Homepage/resume/modern%20style.rar)
The document proposes a method for altering undesired facial expressions in photographs by combining landmark-based and appearance-based facial expression transfer. It aims to utilize the availability of large datasets to tackle large pose differences when transferring expressions between images. The method involves using 3D rotation on a reference image to normalize pose differences, then computing an expression flow to transfer facial components using an appearance-based approach. This allows creating morphs between faces in different poses and angles. The goal is to extend this technique to transfer expressions between people of different ethnicities.
Image Smoothing for Structure ExtractionJia-Bin Huang
The document discusses image smoothing techniques for structure extraction. It aims to achieve edge-aware smoothing while distinguishing texture from structure. Previous related work includes Gaussian blurring, L0 gradient minimization, and domain transformations. The proposed algorithm formulates smoothing as a global optimization problem that minimizes the data term and total variation regularization term. It uses a Huber loss function and iterative reweighted L1 norm to encourage sparsity. Test results will be conducted using source code from previous works. Future work includes implementing the algorithm in CVX and testing effectiveness.
Static and Dynamic Hand Gesture RecognitionJia-Bin Huang
This document summarizes work on static and dynamic hand gesture recognition using a webcam. For static gesture recognition, a random forest classifier was used to recognize four hand poses from images. For dynamic gesture recognition, the goal was to track hand movements to control a mouse cursor. Key challenges addressed were skin detection using color spaces, fingertip detection using hand contour curvatures, and calculating hand center position for cursor control. The work provided skills in computer vision techniques like feature extraction and classification algorithms.
Real-Time Face Detection, Tracking, and Attributes RecognitionJia-Bin Huang
This document summarizes a student's research project on real-time face detection, tracking, and attribute recognition. The goal is to develop a system that can detect faces in real-time and identify attributes like gender, race, etc. It will apply techniques like the Viola-Jones face detection framework, L1 minimization tracking, and face attribute recognition. Potential applications include surveillance, security, human-computer interaction, robotics, and more. The document outlines the methods used and references related work.
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l1-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.
Information Preserving Color Transformation for Protanopia and Deuteranopia (...Jia-Bin Huang
This document proposes a new method for recoloring images to make them more comprehensible for those with protanopia and deuteranopia, two types of color blindness. The method aims to preserve color information in the original images while maintaining natural-looking recolored images. It introduces two error functions to measure information preservation and naturalness, which are combined into an objective function using Lagrange multipliers. This function is minimized to obtain optimal color transformation settings. Experimental results show the method can generate more understandable images for those with color deficiencies while keeping recolored images natural-looking for those with normal vision.
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Jia-Bin Huang
This document proposes a fast re-coloring algorithm to improve image accessibility for those with color vision deficiencies. It discusses how color vision impairment affects one's ability to distinguish colors and reviews previous methods for enhancing color representation. The proposed method remaps hue values in the HSV color space based on local image statistics and enhances color contrast through histogram equalization, allowing users to specify the degree of enhancement.
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
This document summarizes a research paper that presents a new algorithm for detecting foreground objects and moving shadows in surveillance videos. The algorithm uses Gaussian mixture models to learn pixel-based models of cast shadows on background surfaces over time. However, learning pixel-based models can be slow if motion is infrequent. To address this, the algorithm also builds a global shadow model that uses global-level information to help update the local shadow models more quickly. Foreground objects are modeled using nonparametric density estimation of spatial and color information. Finally, background, shadow, and foreground models are combined in a Markov random field energy function that can be efficiently optimized using graph cuts to perform foreground-shadow segmentation. Experimental results demonstrate the effectiveness of the proposed
This document provides tips and strategies for effectively reading academic papers. It discusses deciding what papers to read based on relevance and credibility. It recommends making best use of academic resources like preprint sites, blogs, and mailing lists to stay updated. It explains the importance of reading for breadth to understand the big picture and reading for depth to critically examine assumptions, methods, statistics and conclusions. The document concludes by discussing how to take notes and think creatively after reading papers to develop new research ideas.
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Jia-Bin Huang
We propose a method for automatically guiding patch-based image completion using mid-level structural cues. Our method first estimates planar projection parameters, softly segments the known region into planes, and discovers translational regularity within these planes. This information is then converted into soft constraints for the low-level completion algorithm by defining prior probabilities for patch offsets and transformations. Our method handles multiple planes, and in the absence of any detected planes falls back to a baseline fronto-parallel image completion algorithm. We validate our technique through extensive comparisons with state-of-the-art algorithms on a variety of scenes.
Project page: https://sites.google.com/site/jbhuang0604/publications/struct_completion
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Jia-Bin Huang
Self-similarity based super-resolution (SR) algorithms are able to produce visually pleasing results without extensive training on external databases. Such algorithms exploit the statistical prior that patches in a natural image tend to recur within and across scales of the same image. However, the internal dictionary obtained from the given image may not always be sufficiently expressive to cover the textural appearance variations in the scene. In this paper, we extend self-similarity based SR to overcome this drawback. We expand the internal patch search space by allowing geometric variations. We do so by explicitly localizing planes in the scene and using the detected perspective geometry to guide the patch search process. We also incorporate additional affine transformations to accommodate local shape variations. We propose a compositional model to simultaneously handle both types of transformations. We extensively evaluate the performance in both urban and natural scenes. Even without using any external training databases, we achieve significantly superior results on urban scenes, while maintaining comparable performance on natural scenes as other state-of-the-art SR algorithms.
http://bit.ly/selfexemplarsr
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l1-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.
This document provides an overview of image features and categorization in computer vision. It discusses why categorization is important for making predictions about objects and communicating categories. It describes approaches to categorization like definitional, prototype, and exemplar models. Common image features for categorization like color, texture, gradients, and interest points are presented. Methods for representing images as histograms of these features and encoding local descriptors as "bags of visual words" are covered. Deep convolutional neural networks and region-based representations are also summarized. The document aims to explain current techniques for image and region categorization using supervised learning of classifiers on labeled examples and extracted image features.
This document provides an overview of a computer vision crash course. It begins with an agenda for the course that includes introductions, fundamentals of computer vision, and recent advances. It then discusses some of the challenges of computer vision and provides examples of computer vision applications such as face detection, recognition, tracking, hand tracking, biometrics, optical character recognition, computer vision in sports, scene reconstruction, and more. It also provides a brief history of the field and discusses some of the fundamentals including light, matching, alignment, geometry, grouping, and recognition.
- The document provides an introduction to linear algebra and MATLAB. It discusses various linear algebra concepts like vectors, matrices, tensors, and operations on them.
- It then covers key MATLAB topics - basic data types, vector and matrix operations, control flow, plotting, and writing efficient code.
- The document emphasizes how linear algebra and MATLAB are closely related and commonly used together in applications like image and signal processing.
General principles and tricks for writing fast MATLAB code.
Powerpoint slides: https://uofi.box.com/shared/static/yg4ry6s1c9qamsvk6sk7cdbzbmn2z7b8.pptx
Research 101 - Paper Writing with LaTeXJia-Bin Huang
Paper Writing with LaTeX
PDF: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pdf
PPTX: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pptx
This document provides guidance on how to write a clear scientific paper. It discusses the key sections of a paper including the title, abstract, introduction, related work, method, results, and conclusions. The introduction should motivate the problem, prior approaches, contributions, and provide a teaser figure. The related work section should group existing work into topics and compare approaches. The method section should describe the approach with subsections and forward references. The results section covers experiments, metrics, datasets, and includes visual and quantitative results with an ablation study. Figures and tables should be able to stand alone in a presentation. Writing should be concise, consistent, specific and direct with careful use of words, equations, and notation. Overall, the
Real-time Face Detection and RecognitionJia-Bin Huang
Zelun Luo and Anarghya Mitra created a robust face identification system under professors Jia-Bin Huang and Narendra Ahuja at the University of Illinois. The system uses an integral image and cascade architecture with Haar-like features to identify faces. It can identify multiple faces in an image and faces not in its original database by using a learning algorithm. The integral image allows features to be computed rapidly in constant time, while the cascade structure rejects most non-face sub-windows early.
Here is my updated CV using the ModernCV template (http://www.latextemplates.com/template/moderncv-cv-and-cover-letter).
You can find the Tex source file in (https://dl.dropbox.com/u/2810224/Homepage/resume/modern%20style.rar)
The document proposes a method for altering undesired facial expressions in photographs by combining landmark-based and appearance-based facial expression transfer. It aims to utilize the availability of large datasets to tackle large pose differences when transferring expressions between images. The method involves using 3D rotation on a reference image to normalize pose differences, then computing an expression flow to transfer facial components using an appearance-based approach. This allows creating morphs between faces in different poses and angles. The goal is to extend this technique to transfer expressions between people of different ethnicities.
Image Smoothing for Structure ExtractionJia-Bin Huang
The document discusses image smoothing techniques for structure extraction. It aims to achieve edge-aware smoothing while distinguishing texture from structure. Previous related work includes Gaussian blurring, L0 gradient minimization, and domain transformations. The proposed algorithm formulates smoothing as a global optimization problem that minimizes the data term and total variation regularization term. It uses a Huber loss function and iterative reweighted L1 norm to encourage sparsity. Test results will be conducted using source code from previous works. Future work includes implementing the algorithm in CVX and testing effectiveness.
Static and Dynamic Hand Gesture RecognitionJia-Bin Huang
This document summarizes work on static and dynamic hand gesture recognition using a webcam. For static gesture recognition, a random forest classifier was used to recognize four hand poses from images. For dynamic gesture recognition, the goal was to track hand movements to control a mouse cursor. Key challenges addressed were skin detection using color spaces, fingertip detection using hand contour curvatures, and calculating hand center position for cursor control. The work provided skills in computer vision techniques like feature extraction and classification algorithms.
Real-Time Face Detection, Tracking, and Attributes RecognitionJia-Bin Huang
This document summarizes a student's research project on real-time face detection, tracking, and attribute recognition. The goal is to develop a system that can detect faces in real-time and identify attributes like gender, race, etc. It will apply techniques like the Viola-Jones face detection framework, L1 minimization tracking, and face attribute recognition. Potential applications include surveillance, security, human-computer interaction, robotics, and more. The document outlines the methods used and references related work.
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l1-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.
Information Preserving Color Transformation for Protanopia and Deuteranopia (...Jia-Bin Huang
This document proposes a new method for recoloring images to make them more comprehensible for those with protanopia and deuteranopia, two types of color blindness. The method aims to preserve color information in the original images while maintaining natural-looking recolored images. It introduces two error functions to measure information preservation and naturalness, which are combined into an objective function using Lagrange multipliers. This function is minimized to obtain optimal color transformation settings. Experimental results show the method can generate more understandable images for those with color deficiencies while keeping recolored images natural-looking for those with normal vision.
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Jia-Bin Huang
This document proposes a fast re-coloring algorithm to improve image accessibility for those with color vision deficiencies. It discusses how color vision impairment affects one's ability to distinguish colors and reviews previous methods for enhancing color representation. The proposed method remaps hue values in the HSV color space based on local image statistics and enhances color contrast through histogram equalization, allowing users to specify the degree of enhancement.
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
This document summarizes a research paper that presents a new algorithm for detecting foreground objects and moving shadows in surveillance videos. The algorithm uses Gaussian mixture models to learn pixel-based models of cast shadows on background surfaces over time. However, learning pixel-based models can be slow if motion is infrequent. To address this, the algorithm also builds a global shadow model that uses global-level information to help update the local shadow models more quickly. Foreground objects are modeled using nonparametric density estimation of spatial and color information. Finally, background, shadow, and foreground models are combined in a Markov random field energy function that can be efficiently optimized using graph cuts to perform foreground-shadow segmentation. Experimental results demonstrate the effectiveness of the proposed
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Fast Sparse Representation with Prototypes (CVPR 2010)
1. Fast Sparse Representation with Prototypes
Jia-Bin Huang and Ming-Hsuan Yang
Electrical Engineering and Computer Science, University of California, Merced
jbhuang@ieee.org, mhyang@ieee.org
Overview Algorithm Experimental Results
• Find the approximated sparse solution of the linear system • Original problem • Single Image Super-Resolution
y = Ax
min x 1 subject to y − Fx 2≤ . (1)
• Original sparse coding problem: find a sparse solution from a x
dense matrix • Dictionary learning from the K-SVD algorithm
K ni
min F − DW 2 = fi ,j − Dwi ,j 2
D,W F 2
i =1 j =1
subject to wi ,j 0≤ S0 , (2)
• Approximation process
y ≈ Dwy ≈ DWx. (3)
Dwy + ey = DWx + eFx =⇒ D(wy − Wx) = eFx − ey. (4)
D(wy − Wx) 2 ≤ (s + 1) . (5)
(s + 1)
z 2 = wy − Wx 2 ≤ = . (6)
• Transformedsparse coding problem: find a sparse solution (1 − ρ)
from a sparse matrix • Reduced 1 − norm minimization problem
min x 1 subject to wy − Wx 2≤ . (7)
x
Ground-truth Bicubic [Yang et al. 08] Proposed
Experimental Results Table: Execution time and RMSE for sparse coding on four test images (scale factor = 3)
Image Original Proposed
Method RMSE Time (s) RMSE Time (s) Speedup
• Face Recognition
Girl 5.6684 17.2333 6.2837 1.5564 11.07
Flower 3.3649 14.9173 3.8710 1.3230 11.27
Pathenon 12.247 35.1163 13.469 3.1485 11.15
Raccoon 9.3584 27.9819 10.148 2.3284 12.02
Applications • Human Pose Estimation
Figure: Sample images from the extended Yale database B
• Sparsity-based classification/clustering
• Image restoration (e.g., denoising, inpainting, demosaicking, Table: Recognition accuracy and speed using the Extended Yale database B
super-resolution) Feature Downsampled image PCA
• Compressive sensing
Method Acc (%) Time (s) Acc (%) Time (s)
Original 93.78 20.08 95.01 13.17
Proposed 91.62 0.51 92.28 0.32 Figure: Sample images from the INRIA data set
Speed-up 39.4 41.2
Contributions • Multi-view Object Recognition
Table: Comparison on pose estimation accuracy and speed under different number of
prototypes using the INRIA data set
Table: Comparison on recognition speed and accuracy on the COIL-100 Number of coefficients 3 6 9 12 15 Original
• Exploitthe fact that signals can be well represented by a Number of view 8 16 8 16 Mean error (in degrees) 9.1348 7.9970 7.4406 7.2965 7.1872 6.6513
sparse linear combination of atom signals Feature used
Recognition accuracy Execution time
Orig. (%) Ours (%) Orig. (%) Ours (%) Orig. (s) Ours (ms) Speed-up Orig. (s) Ours (ms) Speed-up
Execution time (in seconds) 0.0082 0.0734 0.3663 1.1020 2.3336 24.69
• Reduce the original dense and large problem to a sparse and Downsample 82.43 80.93 90.01 87.43 6.85 3.2 2140.6 52.73 3.9 13520.5 Speed-up 3011.0 336.4 67.4 22.4 10.6
Downsample 75.08 74.28 84.75 84.00 4.13 3.9 1059.0 48.02 5.2 9234.6
small problem PCA: 256 84.56 82.08 91.03 89.22 3.71 3.3 1124.2 29.58 3.8 7784.2
PCA: 100 81.23 79.23 90.58 87.72 2.54 3.6 705.6 21.00 5.6 3750.0
Jia-Bin Huang and Ming-Hsuan Yang (EECS, University of California, Merced) IEEE Conference on Computer Vision and Pattern Recognition 2010 (CVPR 2010)