Aerial detection part3

The document discusses several methods for aerial object detection: 1. ClusDet proposes a cluster proposal sub-network and scale network to detect sparse and clustered objects. 2. RoI Transformer introduces an RRoI learner and rotated ROI pooling to efficiently detect oriented objects. 3. SCRDet uses a sampling fusion network and multi-dimensional attention network to detect small, cluttered objects of arbitrary orientation. 4. GcGAN employs geometric consistency constraints to perform domain adaptation for aerial images accounting for geometric transformations. 5. CBAM is a convolutional block attention module tested on MS COCO for feature attention.

Aerial detection part2

Tutorial on Object Detection (Faster R-CNN)

Hwa Pyung Kim

The document describes Faster R-CNN, an object detection method that uses a Region Proposal Network (RPN) to generate region proposals from feature maps, pools features from each proposal into a fixed size using RoI pooling, and then classifies and regresses bounding boxes for each proposal using a convolutional network. The RPN outputs objectness scores and bounding box adjustments for anchor boxes sliding over the feature map, and non-maximum suppression is applied to reduce redundant proposals.

150807 Fast R-CNN

Junho Cho

Fast R-CNN is a method that improves object detection speed and accuracy over previous methods like R-CNN and SPPnet. It uses a region of interest pooling layer and multi-task loss to jointly train a convolutional neural network for classification and bounding box regression in a single stage of training. This allows the entire network to be fine-tuned end-to-end for object detection, resulting in faster training and testing compared to previous methods while achieving state-of-the-art accuracy on standard datasets. Specifically, Fast R-CNN trains 9x faster than R-CNN and runs 200x faster at test time.

Convolutional Patch Representations for Image Retrieval An unsupervised approach

Universitat de Barcelona

1. The document presents an unsupervised approach using convolutional neural networks to generate patch-level descriptors for image retrieval. 2. It trains a convolutional kernel network on unlabeled image patches to learn feature representations in a kernel space without requiring manual labels. 3. Experiments show the convolutional kernel descriptors achieve similar or better performance than supervised convolutional neural networks on standard patch and image retrieval datasets while requiring less training time.

Deep image retrieval - learning global representations for image search - ub ...

Universitat de Barcelona

This document summarizes a research paper on deep image retrieval using global image representations. It presents three key ideas: 1) A siamese network trained with a triplet loss to learn image representations optimized for retrieval. 2) Replacing rigid region grids with a region proposal network to localize regions of interest. 3) Experiments showing their method outperforms classification features and achieves state-of-the-art results on standard retrieval datasets. Their work demonstrates an effective and scalable approach to image retrieval based on learning compact global image signatures.

Visual odometry & slam utilizing indoor structured environments

NAVER Engineering

Visual odometry (VO) and simultaneous localization and mapping (SLAM) are fundamental building blocks for various applications from autonomous vehicles to virtual and augmented reality (VR/AR). To improve the accuracy and robustness of the VO & SLAM approaches, we exploit multiple lines and orthogonal planar features, such as walls, floors, and ceilings, common in man-made indoor environments. We demonstrate the effectiveness of the proposed VO & SLAM algorithms through an extensive evaluation on a variety of RGB-D datasets and compare with other state-of-the-art methods.

The document proposes representing objects as single center points rather than bounding boxes. This allows detecting objects through keypoint estimation using a single neural network without post-processing. The method, called CenterNet, predicts center points along with object properties like size in one forward pass. Experiments show CenterNet runs in real-time and is simpler, faster and more accurate than two-stage detectors that require additional pre and post-processing steps. It provides a new direction for real-time object recognition.

Class Weighted Convolutional Features for Image Retrieval

http://imatge-upc.github.io/retrieval-2017-cam/ Image retrieval in realistic scenarios targets large dynamic datasets of unlabeled images. In these cases, training or fine-tuning a model every time new images are added to the database is neither efficient nor scalable. Convolutional neural networks trained for image classification over large datasets have been proven effective feature extractors when transferred to the task of image retrieval. The most successful approaches are based on encoding the activations of convolutional layers as they convey the image spatial information. Our proposal goes beyond and aims at a local-aware encoding of these features depending on the predicted image semantics, with the advantage of using only of the knowledge contained inside the network. In particular, we employ Class Activation Maps (CAMs) to obtain the most discriminative regions from a semantic perspective. Additionally, CAMs are also used to generate object proposals during an unsupervised re-ranking stage after a first fast search. Our experiments on two public available datasets for instance retrieval, Oxford5k and Paris6k, demonstrate that our system is competitive and even outperforms the current state-of-the-art when using off-the-shelf models trained on the object classes of ImageNet.

[unofficial] Pyramid Scene Parsing Network (CVPR 2017)

Pyramid Scene Parsing Network introduces the Pyramid Pooling Module to improve semantic segmentation. The module captures context at different regions and scales by performing average pooling at different pyramid levels on the final convolutional feature map. Experiments on ADE20K and PASCAL VOC datasets show the Pyramid Pooling Module improves mean Intersection-over-Union by over 4% compared to global average pooling, achieving state-of-the-art performance.

Building and road detection from large aerial imagery

This document presents a convolutional neural network approach for simultaneously detecting buildings and roads from aerial imagery in 3 channels. The CNN is trained on image patches from a dataset of 147 aerial images and corresponding 3-channel label maps containing buildings, roads, and other labels. Several CNN architectures are tested on 10 held-out images, with the basic architecture achieving the best precision of 0.8905 and 0.9241 for roads and buildings, respectively, outperforming a previous approach. The proposed method requires no pre-processing or hand-designed image features as the CNN is able to learn good feature extractors automatically through training.

fusion of Camera and lidar for autonomous driving II

Camera-based road Lane detection by deep learning III

VJAI Paper Reading#3-KDD2019-ClusterGCN

Dat Nguyen

[PaperReview] LightGCN: Simplifying and Powering Graph Convolution Network fo...

Zimin Park

This document summarizes the LightGCN recommendation model. It first reviews graph convolutional networks (GCNs) and the NGCF model. It then introduces the key components of LightGCN, including self-connection normalization and removing non-linear activations and feature transformations. LightGCN simplifies GCNs for recommendation by focusing on essential components. It provides ablation studies and comparisons to NGCF and other models to demonstrate LightGCN's effectiveness with less complexity.

Auro tripathy - Localizing with CNNs

Auro Tripathy

Locating objects in images (“detection”) quickly and efficiently enables object tracking and counting applications on embedded visual sensors (fixed and mobile). By 2012, progress on techniques for detecting objects in images – a topic of perennial interest in computer vision – had plateaued, and techniques based on histogram of oriented gradients (HOG) were state of the art. Soon, though, convolutional neural networks (CNNs), in addition to classifying objects, were also beginning to become effective at simultaneously detecting objects. Research in CNN-based object detection was jump-started by the groundbreaking region-based CNN (R-CNN). We’ll follow the evolution of neural network algorithms for object detection, starting with R-CNN and proceeding to Fast R-CNN, Faster R-CNN, “You Only Look Once” (YOLO), and up to the latest Single Shot Multibox detector. In this talk, we’ll examine the successive innovations in performance and accuracy embodied in these algorithms – which is a good way to understand the insights behind effective neural-network-based object localization. We’ll also contrast bounding-box approaches with pixel-level segmentation approaches and present pros and cons.

Matteoli ieee gold_2010_clean

grssieee

This document summarizes a method for hyperspectral target detection using local background suppression. It presents a new algorithm called Local Background Subspace Estimation (LBSE) that estimates the local background subspace in an adaptive, automatic way tailored to spatial variability in backgrounds. LBSE is shown to outperform existing global and local background suppression methods on both simulated and real hyperspectral data, with its local approach properly detecting targets with low residual energy and adapting to spatially varying background complexity within scenes.

Thesis Presentation

Reuben Feinman

This document proposes using a deep belief network (DBN) to learn depth perception from optical flow information. It describes: 1) Using motion parallax and optical flow cues to perceive depth in humans and insects. 2) Generating labeled training data from 3D graphics scenes to teach the DBN the mapping from motion to depth. 3) The DBN architecture, which takes motion energy maps as input and uses multiple hidden layers and backpropagation to predict depth maps. 4) Test results showing the DBN achieves a higher R^2 score for depth prediction than other models like linear regression.

Multi sensor calibration by deep learning

RegNet: Multimodal Sensor Registration Using Deep Neural Networks CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network CFNet: LiDAR-Camera Registration Using Calibration Flow Network

Detection focal loss 딥러닝 논문읽기 모임 발표자료

taeseon ryu

Focal Loss for Dense Object Detection proposes a novel focal loss function to address the extreme foreground-background class imbalance encountered in training dense object detectors. The focal loss focuses training on hard examples and prevents easy negatives from overwhelming the detector. RetinaNet, a simple dense detector designed with a ResNet-FPN backbone and focal loss, achieves state-of-the-art accuracy while running faster than existing two-stage detectors. Extensive experiments demonstrate the focal loss enables training highly accurate dense detectors on datasets with vast numbers of background examples like COCO.

Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Sunando Sengupta

1) Given a sequence of stereo images, the pipeline generates a dense 3D semantic model of the urban environment. 2) Depth maps are generated from stereo images and fused into a volumetric representation using camera poses from feature tracking. 3) Semantic segmentation of street view images is done using a CRF model, and labels are projected onto the 3D model faces to generate the semantic model. 4) The semantic model is evaluated by projecting it back to the input images and calculating metrics like recall and intersection over union. Future work includes real-time implementation and combining image and geometric context.

Remote Sensing IEEE 2015 Projects

Vijay Karan

This document provides information on several remote sensing projects from IEEE 2015. It lists the titles, languages, and abstracts for 8 projects related to classification and analysis of hyperspectral and multispectral images. The projects focus on techniques such as sparse representation in tangent space, Gabor feature-based collaborative representation, level set evolutions for object extraction, and dimension reduction using spatial and spectral regularization.

Autonomous deployment for load balancing surface coverage in sensor networks

ieeepondy

Autonomous deployment for load balancing surface coverage in sensor networks +91-9994232214,8144199666, ieeeprojectchennai@gmail.com, www.projectsieee.com, www.ieee-projects-chennai.com IEEE PROJECTS 2015-2016 ----------------------------------- Contact:+91-9994232214,+91-8144199666 Email:ieeeprojectchennai@gmail.com Support: ------------- Projects Code Documentation PPT Projects Video File Projects Explanation Teamviewer Support

Gnn overview

Louis (Yufeng) Wang

This document provides an overview of graph representation learning and various methods for learning embeddings of nodes in graph-structured data. It introduces shallow methods like DeepWalk and Node2Vec that learn embeddings by generating random walks. It then discusses deep methods like graph convolutional networks (GCN) and GraphSAGE that learn embeddings through neural network aggregation of node neighborhoods. Graph attention networks are also introduced as a learnable aggregator for GCN. Finally, applications of these methods at Pinterest for pin recommendation and at Uber Eats for dish recommendation are briefly described.

Unsupervised/Self-supervvised visual object tracking

Comparative Study of Object Detection Algorithms

IRJET Journal

This document compares different object detection algorithms that use convolutional neural networks: Single Shot Detector (SSD), Faster R-CNN, and R-FCN. These algorithms are evaluated based on their speed and accuracy when combined with different feature extractors like VGG-16, ResNet-101, Inception ResNet, and MobileNet. The algorithms are trained on the COCO dataset and their performance is measured using mean average precision (mAP). SSD is found to be the fastest since it performs all computations in one network without needing region proposals. However, Faster R-CNN and R-FCN achieve higher accuracy. The best combinations are found to be Faster R-CNN with ResNet-101 and R-FCN with ResNet

Depth Fusion from RGB and Depth Sensors by Deep Learning

R-FCN : object detection via region-based fully convolutional networks

Entrepreneur / Startup

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)

The document discusses content-based image retrieval. It begins with an overview of the problem of using a query image to retrieve similar images from a large dataset. Common techniques discussed include using SIFT features with bag-of-words models or convolutional neural network (CNN) features. The document outlines the classic SIFT retrieval pipeline and techniques for using features from pre-trained CNNs, such as max-pooling features from convolutional layers or encoding them with VLAD. It also discusses learning image representations specifically for retrieval using methods like the triplet loss to learn an embedding space that clusters similar images. The state-of-the-art methods achieve the best performance by learning global or regional image representations from CNNs trained on large, generated datasets

What's hot

Objects as points (CenterNet) review [CDM]

Dongmin Choi

Class Weighted Convolutional Features for Image Retrieval

[unofficial] Pyramid Scene Parsing Network (CVPR 2017)

Building and road detection from large aerial imagery

fusion of Camera and lidar for autonomous driving II

Camera-based road Lane detection by deep learning III

VJAI Paper Reading#3-KDD2019-ClusterGCN

Dat Nguyen

[PaperReview] LightGCN: Simplifying and Powering Graph Convolution Network fo...

Zimin Park

Auro tripathy - Localizing with CNNs

Auro Tripathy

Matteoli ieee gold_2010_clean

grssieee

Thesis Presentation

Reuben Feinman

Multi sensor calibration by deep learning

Detection focal loss 딥러닝 논문읽기 모임 발표자료

taeseon ryu

Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Sunando Sengupta

Remote Sensing IEEE 2015 Projects

Vijay Karan

Autonomous deployment for load balancing surface coverage in sensor networks

ieeepondy

Gnn overview

Louis (Yufeng) Wang

Unsupervised/Self-supervvised visual object tracking

Comparative Study of Object Detection Algorithms

IRJET Journal

Depth Fusion from RGB and Depth Sensors by Deep Learning

What's hot (20)

Objects as points (CenterNet) review [CDM]

Class Weighted Convolutional Features for Image Retrieval

[unofficial] Pyramid Scene Parsing Network (CVPR 2017)

Building and road detection from large aerial imagery

fusion of Camera and lidar for autonomous driving II

Camera-based road Lane detection by deep learning III

VJAI Paper Reading#3-KDD2019-ClusterGCN

[PaperReview] LightGCN: Simplifying and Powering Graph Convolution Network fo...

Auro tripathy - Localizing with CNNs

Matteoli ieee gold_2010_clean

Thesis Presentation

Multi sensor calibration by deep learning

Detection focal loss 딥러닝 논문읽기 모임 발표자료

Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Remote Sensing IEEE 2015 Projects

Autonomous deployment for load balancing surface coverage in sensor networks

Gnn overview

Unsupervised/Self-supervvised visual object tracking

Comparative Study of Object Detection Algorithms

Depth Fusion from RGB and Depth Sensors by Deep Learning

Similar to Aerial detection part3

R-FCN : object detection via region-based fully convolutional networks

Entrepreneur / Startup

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)

Convolutional Neural Network for pixel-wise skyline detection

Darian Frajberg

camera-based Lane detection by deep learning

object detection paper review

Yoonho Na

- R-CNN was the first CNN model to achieve high performance in object detection. It used a multi-stage pipeline involving region proposals, feature extraction via CNN, and SVM classification. It was slow due to computing CNN features for each region individually. - Fast R-CNN improved on R-CNN by introducing a ROI pooling layer to share computation and enabling end-to-end training. However, region proposals were still generated externally, slowing down detection. - Faster R-CNN addressed this by introducing a Region Proposal Network to generate proposals, allowing the entire model to be trained end-to-end. This led to faster and more accurate detection compared to previous models. - YOLO

Object Detection is a very powerful field.pptx

usmanyaseen16

LiDAR-based Autonomous Driving III (by Deep Learning)

Adaptive object detection using adjacency and zoom prediction

Slides by Miriam Bellver from the Computer Vision Reading Group at the Universitat Politecnica de Catalunya about the paper: Lu, Yongxi, Tara Javidi, and Svetlana Lazebnik. "Adaptive Object Detection Using Adjacency and Zoom Prediction." CVPR 2016 Abstract: State-of-the-art object detection systems rely on an accurate set of region proposals. Several recent methods use a neural network architecture to hypothesize promising object locations. While these approaches are computationally efficient, they rely on fixed image regions as anchors for predictions. In this paper we propose to use a search strategy that adaptively directs computational resources to sub-regions likely to contain objects. Compared to methods based on fixed anchor locations, our approach naturally adapts to cases where object instances are sparse and small. Our approach is comparable in terms of accuracy to the state-of-the-art Faster R-CNN approach while using two orders of magnitude fewer anchors on average. Code is publicly available.

Lidar for Autonomous Driving II (via Deep Learning)

The document outlines research on using LiDAR data for autonomous vehicle object detection. It begins with an introduction to sensor fusion techniques using LiDAR and camera data. Several deep learning approaches for 3D object detection from LiDAR point clouds are then summarized, including methods that project the point cloud into 2D feature maps or 3D voxel grids as input to convolutional networks. Finally, techniques for exploiting HD maps and performing real-time on-device detection are discussed. The document provides an overview of the state-of-the-art in LiDAR-based object detection for autonomous driving applications.

Neural Radiance Fields & Neural Rendering.pdf

NavneetPaul2

Neural Radiance Fields (NeRF) represents scenes as neural radiance fields that can be used for novel view synthesis. NeRF learns a continuous radiance field from a sparse set of input views using a multi-layer perceptron that maps 5D coordinates to RGB color and density values. It uses volumetric rendering to integrate these values along camera rays and optimizes the network via differentiable rendering and a reconstruction loss. NeRF produces high-fidelity novel views and has inspired extensions like handling dynamic scenes and reconstructing scenes from unstructured internet photos.

PCA-SIFT: A More Distinctive Representation for Local Image Descriptors

wolf

PCA-SIFT is a modification of SIFT that uses principal component analysis (PCA) to build more distinctive local image descriptors. It constructs a projection matrix from a large set of image patches, then projects each keypoint descriptor through this matrix to a compact vector of the top n principal components. This provides a more discriminative representation than SIFT while reducing descriptor dimensionality, leading to improved matching accuracy and efficiency. Evaluation on controlled transformation and graffiti datasets shows PCA-SIFT achieves higher recall rates at equivalent or lower false positive rates compared to SIFT.

Conditional Image Generation with PixelCNN Decoders

suga93

The document summarizes research on conditional image generation using PixelCNN decoders. It discusses how PixelCNNs sequentially predict pixel values rather than the whole image at once. Previous work used PixelRNNs, but these were slow to train. The proposed approach uses a Gated PixelCNN that removes blind spots in the receptive field by combining horizontal and vertical feature maps. It also conditions PixelCNN layers on class labels or embeddings to generate conditional images. Experimental results show the Gated PixelCNN outperforms PixelCNN and achieves performance close to PixelRNN on CIFAR-10 and ImageNet, while training faster. It can also generate portraits conditioned on embeddings of people.

D3L4-objects.pdf

ssusere945ae

This document discusses object detection in images using deep convolutional neural networks. It begins by framing object detection as classification at multiple positions and scales. The document then reviews early approaches like HOG and deformable part models before introducing R-CNN and its improvements, Fast R-CNN and Faster R-CNN, which share computation between proposals. Faster R-CNN introduces a region proposal network to generate proposals. Finally, it briefly discusses one-stage detectors like YOLO and SSD that directly predict boxes and classes.

IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...

IRJET Journal

This document proposes a method for remote sensing image retrieval using convolutional neural networks with weighted distance and result re-ranking. It has two stages: 1) An offline stage where a pre-trained CNN is fine-tuned on labeled images to extract features for the retrieval dataset. 2) An online stage where the fine-tuned CNN extracts features from a query image and calculates weighted distances to retrieved images, giving more preference to images from similar classes to the query. Experiments on two datasets show the method improves retrieval performance compared to state-of-the-art methods.

最近の研究情勢についていくために - Deep Learningを中心に -

Hiroshi Fukui

This document summarizes key developments in deep learning for object detection from 2012 onwards. It begins with a timeline showing that 2012 was a turning point, as deep learning achieved record-breaking results in image classification. The document then provides overviews of 250+ contributions relating to object detection frameworks, fundamental problems addressed, evaluation benchmarks and metrics, and state-of-the-art performance. Promising future research directions are also identified.

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

https://telecombcn-dl.github.io/2017-dlcv/ Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

物件偵測與辨識技術

CHENHuiMei

This document discusses deep learning techniques for object detection and recognition. It provides an overview of computer vision tasks like image classification and object detection. It then discusses how crowdsourcing large datasets from the internet and advances in machine learning, specifically deep convolutional neural networks (CNNs), have led to major breakthroughs in object detection. Several state-of-the-art CNN models for object detection are described, including R-CNN, Fast R-CNN, Faster R-CNN, SSD, and YOLO. The document also provides examples of applying these techniques to tasks like face detection and detecting manta rays from aerial videos.

Object Pose Estimation

Arithmer Inc.

Slide for study session given by Ryosuke Sasaki at Arithmer inc. It is a summary of recent methods for object pose estimation in robotics using deep learning. He entered Ph.D course at Univ. of Tokyo in April 2020. Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。 Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)

NS - CUK Seminar : V.T.Hoang, Review on "Structure-Aware Transformer for Grap...

ssuser4b1f48

Similar to Aerial detection part3 (20)

R-FCN : object detection via region-based fully convolutional networks

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)

Convolutional Neural Network for pixel-wise skyline detection

camera-based Lane detection by deep learning

object detection paper review

Object Detection is a very powerful field.pptx

LiDAR-based Autonomous Driving III (by Deep Learning)

Adaptive object detection using adjacency and zoom prediction

Lidar for Autonomous Driving II (via Deep Learning)

Neural Radiance Fields & Neural Rendering.pdf

PCA-SIFT: A More Distinctive Representation for Local Image Descriptors

Conditional Image Generation with PixelCNN Decoders

D3L4-objects.pdf

IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...

最近の研究情勢についていくために - Deep Learningを中心に -

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

物件偵測與辨識技術

Object Pose Estimation

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)

NS - CUK Seminar : V.T.Hoang, Review on "Structure-Aware Transformer for Grap...

More from ssuser456ad6

Scale invariant feature transform

The document discusses the Scale Invariant Feature Transform (SIFT) algorithm which has 3 main steps: 1) Interest point detection using scale-space extrema of the scale-normalized Laplacian to find keypoints invariant to scale and orientation, 2) Generating a feature vector descriptor for each keypoint based on orientation, contrast normalization and local gradient directions, and 3) Matching descriptors between images after transforming to be invariant to affine changes.

Learning joint 2 d 3d representations for depth completion

The document discusses a method for depth completion using a neural network that learns joint 2D-3D representations. It introduces depth completion and relative work in depth estimation from RGB data and depth completion from RGBD data. It then describes the proposed method which uses a 2D-3D Fuse Block that learns joint 2D and 3D representations, and stacks these blocks into a network for learning and inference of depth completion.

Guided image filter

This document discusses guided image filtering. It introduces the guided filter, which performs edge-preserving smoothing while maintaining the gradient of a guidance image. The guided filter works by assuming a local linear model between the guidance image and filtering output within a window, and solving a cost function to determine the filter coefficients. It can perform edge-preserving smoothing and gradient-preserving filtering in linear time complexity.

Fast cost volume filtering for visual correspondence and beyond

This document discusses fast cost-volume filtering techniques for visual correspondence tasks like stereo matching. It notes that global matching algorithms are slow while local matching lacks accuracy, requiring post-processing. It proposes to smooth the cost volume with a weighted box filter to overcome these issues, allowing for fast and accurate stereo matching through cost-volume filtering and aggregation.

D2 net a trainable cnn for joint description and detection of local features

Gan dissection