Intelligent indoor mobile robot navigation using stereo vision
3-D Reconstruction of Human Organs Using Feature Detection Algorithms
1. Video Filtering via Conventional Feature Detection Algorithms for
3-D Reconstruction of Human Organs
Mark Chang | Human Photonics Laboratory (HPL) | Department of Mechanical Engineering, University of Washington
Abstract—Marrying computer vision algorithms with modern medical instruments allows 3-D reconstruction of whole organs, which significantly aids with the development of better and
quicker diagnosis and treatment of diseases. As a part of the preparing images for 3-D reconstruction, feature detection algorithms are implemented to demonstrate successful filtering of
redundant frames from a bladder phantom video obtained using an Scanning Fiber Endoscope.
I. Introduction
The development of advanced computer vision algorithms and medical
instruments the past 20 years have been complementing each other in ways
to better diagnose and treat as well as to save time and money for patients.
This poster serves to document a video filtering algorithm used to acquire
non-redundant images from a bladder phantom video taken with the HPL
Scanning Fiber Endoscope (SFE). In the future, these images will then be
used to generate a 3-D reconstruction of the bladder phantom.
II. Background
The HPL SFE is an ultra-thin, flexible endoscope capable of acquiring images
at its tip. At a rate of 30fps, the SFE took a video of a bladder phantom, and
its movement was hand-actuated every couple of seconds to obtain video in
this poster. Therefore, without proper video filtering, numerous adjacent
frames in the video are almost identical or redundant information needed
for 3-D reconstruction. Below are two frames from the video:
Figure 1. Frames 136 & 229 from Bladder Phantom SFE Video
To the human eye, the vertical shift of the veins (blue) is obvious, and thus
understanding that the two images are different from each other is intuitive.
However, the computer must be provided with what are known as keypoint
detectors and keypoint descriptors in order to understand that the two
images are different from another.
Keypoints are points of interest that the computer use recognize unique
features of images used for image representation, object recognition, 3-D
reconstruction, and more. Important qualities for keypoint detectors are
repeatability and robustness to transformations (scale and rotation) and
noise. Keypoint descriptors must be able to provide the most important and
distinctive information in a computationally efficient manner.
Over the past 20 years, numerous feature detection and matching
algorithms have been developed. Out of many, Scale Invariant Feature
Transform (SIFT), Speeded Up Robust Features (SURF), and Binary Robust
Invariant Scalable Keypoints (BRISK) are the most modern and robust and
techniques. Because SIFT is patented and not available for public use, SURF
and BRISK are studied here.
Figure 2. Feature Matching Results from SURF (left) and BRISK (right)
For the tuned parameters, BRISK found more features per frame. However,
both feature detection algorithms produced similar results in terms of the
final number of filtered images, especially considering that the parameters
can be further tuned for increased variability in results. But if one were to
extend the results to 3-D reconstruction, BRISK is recommended over SURF
since 3-D reconstruction relies on distinct features to build around.
V. Conclusion
Both SURF and BRISK were robust-enough to filter redundant images from
the bladder phantom video taken with the SFE. Future work involves
investigating the SIFT algorithm to compare against SURF or BRISK, and a 3-
D Reconstruction from the filtered images to ensure that the quality of the
filtering was sufficient. Furthermore, the algorithm can be tested under real
clinical conditions, such as transient distortions (refractive index changes
with new urine pulsed into bladder), bubbles, and “floaters,” noise and non-
uniform illumination. Comparing the quality of 3-D reconstruction using the
pre-processing algorithm here to previous work done on the phantom is
also of interest.
Figure 3. Previous Work on 3-D Reconstruction of Bladder Phantom [4]
Acknowledgement & References
The following poster was created as a part of the University of Washington course AMATH582:
Computational Methods of Data Analysis, instructor Nathan Kutz. The bladder phantom video was
contributed by Louis Gong and Eric Seibel of the Human Photonics Laboratory, University of Washington
[1] Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. "SURF:Speeded Up Robust Features." Computer Vision and
Image Understanding (CVIU).Vol. 110, No. 3, 2008, pp. 346–359.
[2] Burkhardt, M. R., Soper, T. D., Yoon, W. J., and Seibel, E. J., "Controlling the trajectory of a scanning fiber
endoscope for automatic bladder surveillance," IEEE/ASME Transactions on Mechatronics 19(1), 366-373
(2014).
[3] Leutenegger, S., M. Chli, and R. Siegwart. "BRISK: Binary Robust Invariant Scalable
Keypoints."Proceedings of the IEEE International Conference. ICCV, 2011.
[4] X. Ye, Y. Gong, W. J. Yoon, “Development of Multisegment Steering Mechanism and 3-D panorama for
Automated Bladder Surveillance System," IEEE/ASME Transactions on Mechatronics 21(2), 993-1003
(2016).
SURF vs. BRISK
Both SURF and BRISK are known for their scale and rotation invariance and
robustness to noise. In general BRISK can be thought of as more of the
modern method with similar robustness as SURF but with faster speed.
SURF: For its detector, SURF uses what the authors call a ‘Fast-Hessian
Detector,’ which converts the regular Hessian matrix such that its
determinant accounts for both location and scale in one as in equation (1).
For its descriptor, it utilizes Haar-Wavelet around the keypoints [1].
(1)
BRISK: Unlike SURF, BRISK relies on the FAST 9-16 detector which divides an
image into multiple layers of octaves and sub-octaves and determines the
existence of a interest point by observing if the contrast between the point
to nearby points is beyond a threshold value. The BRISK descriptor is a
binary string that compares the brightness intensity in its surrounding as in
equation (2) below, where ‘i’ represents x-direction and ‘j’ represents y-
direction [3].
(2)
III. Algorithm
After frames were extracted from the video, each frame was cropped into a
square such that the circular edge of the endoscope was not a factor for
noise. The program enters a WHILE loop that extracts features from each
frame using a manually tuned feature detection algorithm. The loop also
keeps track of current frame count and previous and current extracted
features. The extracted features are matched between the current and
previous frame, and the resulting match locations are used as conditions to
an IF statement, where if the number of matched features are equal from
previous and current match results AND if the maximum pixel difference is
greater than 5 the current frame should be saved as a ‘.jpeg’ file in the
designated folder. The code was implemented using the MATLAB Computer
Vision System Toolbox.
IV. Results
After fine-tuning the various parameters and threshold values, the filtering
was successful for both methods. For SURF, the metric threshold value was
set to 750, number of octaves 4, number of scale values 4. And for BRISK,
the minimum contrast 0.05, minimum quality 0.1, number of octaves 4.
From the 7045 original frames, SURF and BRISK were able to filter the video
into 379 frames and 316 frames, respectively. `