Despite recent successes of 3D reconstruction, the majority of researches mainly focus on acquiring the precise geometry.
Even though many computer graphics applications such as AR/VR need more than just scene geometry such as surface color and semantics to provide richer user experience, existing 3D reconstruction methods leave such auxiliary information behind their consideration.
This talk will present our two approaches to reconstruct color and semantic information of 3D indoor scenes as follow:
Junho Jeon, Yeongyu Jung, Haejoon Kim, Seungyong Lee, "Texture map generation for 3D reconstructed scenes", The Visual Computer (CGI 2016), Vol. 32, No. 5, May 2016.
Junho Jeon, Jinwoong Jung, Jungeon Kim, Seungyong Lee, "Semantic Reconstruction: Reconstruction of Semantically Segmented 3D Meshes via Volumetric Semantic Fusion", Computer Graphics Forum (Pacific Graphics 2018), Vol. 37, No. 7, October 2018.
CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of MapsNAVER Engineering
Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granularity of this partitioning presents a critical trade-off; using fewer but larger cells results in lower location accuracy while using more but smaller cells reduces the number of training examples per class and increases model size, making the model prone to overfitting. To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. Each classifier votes for the fine-grained classes that overlap with their respective coarse-grained ones. This technique allows us to predict locations at a fine scale while maintaining sufficient training examples per class. Our algorithm achieves the state-of-the-art performance in location recognition on multiple benchmark datasets.
Seed net automatic seed generation with deep reinforcement learning for robus...NAVER Engineering
본 논문에서는 interactive segmentation 문제를 풀기 위하여 deep reinforcement learning을 활용한 seed gereration 기법을 제안한다. Interactive segmentation 문제의 이슈 중 하나는 사용자의 개입을 최소화하는 것이다. 본 논문에서 제안하는 시스템이 사용자를 대신하여 인공적인 seed를 생성하게 된다. 사용자는 initial seed 정보만을 제공하면 된다. 우리는 optimal seed point 정의의 모호함으로 인해 supervised 기법을 사용하여 학습하기 어려운 점을 reinforcement learning 기법을 사용하여 극복하였다. Seed generation 문제에 맞도록 MDP를 정의하여 deep-q-network를 성공적으로 학습하였다. 우리는 MSRA10K 데이터셋에 대하여 학습을 진행하여 기존 segmentation 알고리즘의 부정확한 initial 결과 대비 우수한 성능을 보였다.
We presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates locally-varying affine transformation fields across images.
To deal with intra-class appearance and shape variations that commonly exist among different instances within the same object category,
we leverage a pyramidal model where affine transformation fields are progressively estimated in a coarse-to-fine manner so that the smoothness constraint is naturally imposed within deep networks.
PARN estimates residual affine transformations at each level and composes them to estimate final affine transformations.
Furthermore, to overcome the limitations of insufficient training data for semantic correspondence, we propose a novel weakly-supervised training scheme that generates progressive supervisions by leveraging a correspondence consistency across image pairs.
Our method is fully learnable in an end-to-end manner and does not require quantizing infinite continuous affine transformation fields.
본 논문은 single depth map으로부터의 정확한 3D hand pose estimation을 목표로 한다. 3D hand pose estimation은 HCI, AR등의 기술을 구현함에 있어서 매우 중요한 기술이다. 이를 위해 많은 연구자들이 정확도를 높이기 위해 여러 방법을 제시하였지만, 여전히 손가락들의 비슷한 생김새, 가려짐, 다양한 손가락의 움직임으로 인한 복잡성 때문에 정확도를 올리는데 한계가 있었다. 본 논문은 기존 방법들의 한계를 극복하기 위해 기존 방법들이 사용하는 입력 형태와 출력 형태를 바꾸었다. 2d depth image를 입력으로 받아 hand joint의 3D coordinate를 직접 regress하는 대부분의 기존 방법들과는 달리, 제안하는 모델은 3D voxelized depth map을 입력으로 받아 3D heatmap을 출력한다. 이를 위해 encoder-decoder 형식의 3D CNN을 사용하였고, 달라진 입력과 출력 형태로 인해 제안하는 모델은 널리 사용되는 3개의 3d hand pose estimation dataset, 1개의 3d human pose estimation dataset에서 가장 높은 성능을 내었다. 또한 ICCV 2017에서 주최된 HANDS 2017 challenge에서 우승 하였다.
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...Simone Ercoli
I presented an interesting paper during the Vision and Multimedia Reading Group about DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (pdf).
It is a complete evaluation about features extracted from the activation of a deep convolutional network trained with a large scale dataset.
This a work of Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell from Berkeley University
Scaling up Deep Learning Based Super Resolution AlgorithmsXiaoyong Zhu
Superresolution is a process for obtaining one or more high-resolution images from one or more low-resolution observations. It has been used for many applications, including satellite and aerial imaging, medical image processing, ultrasound imaging, line fitting, automated mosaicking, infrared imaging, facial image improvement, text image improvement, compressed image and video enhancement, and fingerprint image enhancement. While research on superresolution began in the 1970s, recently, with the power of deep learning, many notable new methods have been created, including SRCNN, SRResNet, and lately, SRGANs, which use generative adversarial networks. However, since these approaches require a lot of images to train the deep learning network, they are supercompute intensive. Fortunately, with the power of the cloud, you can easily scale up the compute resources as needed, making the algorithm converge faster.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of MapsNAVER Engineering
Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granularity of this partitioning presents a critical trade-off; using fewer but larger cells results in lower location accuracy while using more but smaller cells reduces the number of training examples per class and increases model size, making the model prone to overfitting. To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. Each classifier votes for the fine-grained classes that overlap with their respective coarse-grained ones. This technique allows us to predict locations at a fine scale while maintaining sufficient training examples per class. Our algorithm achieves the state-of-the-art performance in location recognition on multiple benchmark datasets.
Seed net automatic seed generation with deep reinforcement learning for robus...NAVER Engineering
본 논문에서는 interactive segmentation 문제를 풀기 위하여 deep reinforcement learning을 활용한 seed gereration 기법을 제안한다. Interactive segmentation 문제의 이슈 중 하나는 사용자의 개입을 최소화하는 것이다. 본 논문에서 제안하는 시스템이 사용자를 대신하여 인공적인 seed를 생성하게 된다. 사용자는 initial seed 정보만을 제공하면 된다. 우리는 optimal seed point 정의의 모호함으로 인해 supervised 기법을 사용하여 학습하기 어려운 점을 reinforcement learning 기법을 사용하여 극복하였다. Seed generation 문제에 맞도록 MDP를 정의하여 deep-q-network를 성공적으로 학습하였다. 우리는 MSRA10K 데이터셋에 대하여 학습을 진행하여 기존 segmentation 알고리즘의 부정확한 initial 결과 대비 우수한 성능을 보였다.
We presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates locally-varying affine transformation fields across images.
To deal with intra-class appearance and shape variations that commonly exist among different instances within the same object category,
we leverage a pyramidal model where affine transformation fields are progressively estimated in a coarse-to-fine manner so that the smoothness constraint is naturally imposed within deep networks.
PARN estimates residual affine transformations at each level and composes them to estimate final affine transformations.
Furthermore, to overcome the limitations of insufficient training data for semantic correspondence, we propose a novel weakly-supervised training scheme that generates progressive supervisions by leveraging a correspondence consistency across image pairs.
Our method is fully learnable in an end-to-end manner and does not require quantizing infinite continuous affine transformation fields.
본 논문은 single depth map으로부터의 정확한 3D hand pose estimation을 목표로 한다. 3D hand pose estimation은 HCI, AR등의 기술을 구현함에 있어서 매우 중요한 기술이다. 이를 위해 많은 연구자들이 정확도를 높이기 위해 여러 방법을 제시하였지만, 여전히 손가락들의 비슷한 생김새, 가려짐, 다양한 손가락의 움직임으로 인한 복잡성 때문에 정확도를 올리는데 한계가 있었다. 본 논문은 기존 방법들의 한계를 극복하기 위해 기존 방법들이 사용하는 입력 형태와 출력 형태를 바꾸었다. 2d depth image를 입력으로 받아 hand joint의 3D coordinate를 직접 regress하는 대부분의 기존 방법들과는 달리, 제안하는 모델은 3D voxelized depth map을 입력으로 받아 3D heatmap을 출력한다. 이를 위해 encoder-decoder 형식의 3D CNN을 사용하였고, 달라진 입력과 출력 형태로 인해 제안하는 모델은 널리 사용되는 3개의 3d hand pose estimation dataset, 1개의 3d human pose estimation dataset에서 가장 높은 성능을 내었다. 또한 ICCV 2017에서 주최된 HANDS 2017 challenge에서 우승 하였다.
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...Simone Ercoli
I presented an interesting paper during the Vision and Multimedia Reading Group about DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (pdf).
It is a complete evaluation about features extracted from the activation of a deep convolutional network trained with a large scale dataset.
This a work of Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell from Berkeley University
Scaling up Deep Learning Based Super Resolution AlgorithmsXiaoyong Zhu
Superresolution is a process for obtaining one or more high-resolution images from one or more low-resolution observations. It has been used for many applications, including satellite and aerial imaging, medical image processing, ultrasound imaging, line fitting, automated mosaicking, infrared imaging, facial image improvement, text image improvement, compressed image and video enhancement, and fingerprint image enhancement. While research on superresolution began in the 1970s, recently, with the power of deep learning, many notable new methods have been created, including SRCNN, SRResNet, and lately, SRGANs, which use generative adversarial networks. However, since these approaches require a lot of images to train the deep learning network, they are supercompute intensive. Fortunately, with the power of the cloud, you can easily scale up the compute resources as needed, making the algorithm converge faster.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
발표자: 최봉수(Chris Choy, Stanford University 박사 과정)
발표일: 2017.8.
CVPR, ICCV, CVPR, NIPS, TIP, ICRA, TPAMI, 3DV, 등 Reviewer 및 worshop organizer
개요:
3D perception consists broadly of simple geometric understanding of the objects to high-level cognition such as semantic scene understanding and inferring the relationship between objects. In this talk, I'll present broad class of works from low-level to high-level cognition tasks that encompass the 3D perception.
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
Abstract (Eng/Kor):
Image restoration (IR) is one of the fundamental problems, which includes denoising, deblurring, super-resolution, etc. Among those, in today's talk, I will more focus on the super-resolution task. There are two main streams in the super-resolution studies; a traditional model-based optimization and a discriminative learning method. I will present the pros and cons of both methods and their recent developments in the research field. Finally, I will provide a mathematical view that explains both methods in a single holistic framework, while achieving the best of both worlds. The last slide summarizes the remaining problems that are yet to be solved in the field.
영상 복원(Image restoration, IR)은 low-level vision에서 매우 중요하게 다루는 근본적인 문제 중 하나로서 denoising, deblurring, super-resolution 등의 다양한 영상 처리 문제를 포괄합니다. 오늘 발표에서는 영상 복원 분야 중에서도 super-resolution 문제에 대해 집중적으로 다루겠습니다. 전통적인 model-based optimization 방식과 deep learning을 적용하여 문제를 푸는 방식에 대해, 각각의 장단점과 최신 연구 발전 흐름을 소개하겠습니다. 마지막으로는 이 둘을 하나로 잇는 통일된 관점을 제시하고 관련 연구들 살펴본 후, super-resolution 분야에서 아직 남아있는 문제점들을 정리하겠습니다.
Single Image Depth Estimation using frequency domain analysis and Deep learningAhan M R
Using Machine Learning and Deep Learning Techniques, we train the ResNet CNN Model and build a model for estimating Depth using the Discrete Fourier Domain Analysis, and generate results including the explanation of the Loss function and code snippets.
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 3: prior embedding deep super resolution
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-
ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make train-
ing faster, we used non-saturating neurons and a very efficient GPU implemen-
tation of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry.
Attentive semantic alignment with offset aware correlation kernelsNAVER Engineering
Semantic correspondence is the problem of establishing correspondences across images depicting different instances of the same object or scene class. One of recent approaches to this problem is to estimate parameters of a global transformation model that densely aligns one image to the other. Since an entire correlation map between all feature pairs across images is typically used to predict such a global transformation, noisy features from different backgrounds, clutter, and occlusion distract the predictor from correct estimation of the alignment. This is a challenging issue, in particular, in the problem of semantic correspondence where a large degree of image variations is often involved. In this paper, we introduce an attentive semantic alignment method that focuses on reliable correlations, filtering out distractors. For effective attention, we also propose an offset-aware correlation kernel that learns to capture translation-invariant local transformations in computing correlation values over spatial locations. Experiments demonstrate the effectiveness of the attentive model and offset-aware kernel, and the proposed model combining both techniques achieves the state-of-the-art performance.
Convolutional Neural Networks : Popular Architecturesananth
In this presentation we look at some of the popular architectures, such as ResNet, that have been successfully used for a variety of applications. Starting from the AlexNet and VGG that showed that the deep learning architectures can deliver unprecedented accuracies for Image classification and localization tasks, we review other recent architectures such as ResNet, GoogleNet (Inception) and the more recent SENet that have won ImageNet competitions.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Explores the type of structure learned by Convolutional Neural Networks, the applications where they're most valuable and a number of appropriate mental models for understanding deep learning.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
Enhanced Deep Residual Networks for Single Image Super-ResolutionNAVER Engineering
발표자: 김희원 (서울대학교 박사과정)
발표일: 2017.9.
(현)서울대학교 전기정보공학 석박통합과정 재학
Best Paper Award of NTIRE 2017 Workshop: Challenge Track
개요:
Single Image Super-Resolution은 저해상도 이미지를 고해상도의 원본 이미지로 복원시키는 연구 분야입니다. 실생활에서 접할 수 있는 흔한 예로는 SNS 사진 중 작은 부분을 크게 확대해도 선명하게 하는 것이나, thumb nail로 원본 이미지만큼의 해상도를 만들어 내는 것입니다.
이번 발표에서는 딥러닝 전과 후의 연구방향에 대해서 알아본 후, CVPR 2017의 2nd NTIRE Workshop Challenge에서 우승한 저희 팀의 연구를 신경망 구조에 대한 분석을 중심으로 살펴보려고 합니다.
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
해당 논문은 3D Aware 모델입니다 StyleGAN 같은 경우에는 어떤 하나의 피처에 대해서 Editing 하고 싶을 때 입력에 해당하는 레이턴트 백터를 찾아서 레이턴트 백터를 수정함으로써 입에 해당하는 피쳐를 바꿀 수 있었는데 이런 컨셉을 그대로 착안해서
GAN 스페이스 논문에서는 인풋이 들어왔을 때 어떤 공간적인 정보까지도 에디팅하려고 시도했습니다 결과를 봤을 때 로테이션 정보가 어느 정도 잘 학습된 것 같지만 같은 사람이 아닌 것 같이 인식되기도 합니다 이러한 문제를 이제 disentangle 되지 않았다라고 하는 게 원하는 피처만 변화시켜야 되는 것과 달리 다른 피처까지도 모두 학습 모두 변했다는 것인데 이를 좀 더 효율적으로 3D를 더 잘 이해시키기 위해서 탄생한 논문입니다.
발표자: 최봉수(Chris Choy, Stanford University 박사 과정)
발표일: 2017.8.
CVPR, ICCV, CVPR, NIPS, TIP, ICRA, TPAMI, 3DV, 등 Reviewer 및 worshop organizer
개요:
3D perception consists broadly of simple geometric understanding of the objects to high-level cognition such as semantic scene understanding and inferring the relationship between objects. In this talk, I'll present broad class of works from low-level to high-level cognition tasks that encompass the 3D perception.
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
Abstract (Eng/Kor):
Image restoration (IR) is one of the fundamental problems, which includes denoising, deblurring, super-resolution, etc. Among those, in today's talk, I will more focus on the super-resolution task. There are two main streams in the super-resolution studies; a traditional model-based optimization and a discriminative learning method. I will present the pros and cons of both methods and their recent developments in the research field. Finally, I will provide a mathematical view that explains both methods in a single holistic framework, while achieving the best of both worlds. The last slide summarizes the remaining problems that are yet to be solved in the field.
영상 복원(Image restoration, IR)은 low-level vision에서 매우 중요하게 다루는 근본적인 문제 중 하나로서 denoising, deblurring, super-resolution 등의 다양한 영상 처리 문제를 포괄합니다. 오늘 발표에서는 영상 복원 분야 중에서도 super-resolution 문제에 대해 집중적으로 다루겠습니다. 전통적인 model-based optimization 방식과 deep learning을 적용하여 문제를 푸는 방식에 대해, 각각의 장단점과 최신 연구 발전 흐름을 소개하겠습니다. 마지막으로는 이 둘을 하나로 잇는 통일된 관점을 제시하고 관련 연구들 살펴본 후, super-resolution 분야에서 아직 남아있는 문제점들을 정리하겠습니다.
Single Image Depth Estimation using frequency domain analysis and Deep learningAhan M R
Using Machine Learning and Deep Learning Techniques, we train the ResNet CNN Model and build a model for estimating Depth using the Discrete Fourier Domain Analysis, and generate results including the explanation of the Loss function and code snippets.
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 3: prior embedding deep super resolution
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-
ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make train-
ing faster, we used non-saturating neurons and a very efficient GPU implemen-
tation of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry.
Attentive semantic alignment with offset aware correlation kernelsNAVER Engineering
Semantic correspondence is the problem of establishing correspondences across images depicting different instances of the same object or scene class. One of recent approaches to this problem is to estimate parameters of a global transformation model that densely aligns one image to the other. Since an entire correlation map between all feature pairs across images is typically used to predict such a global transformation, noisy features from different backgrounds, clutter, and occlusion distract the predictor from correct estimation of the alignment. This is a challenging issue, in particular, in the problem of semantic correspondence where a large degree of image variations is often involved. In this paper, we introduce an attentive semantic alignment method that focuses on reliable correlations, filtering out distractors. For effective attention, we also propose an offset-aware correlation kernel that learns to capture translation-invariant local transformations in computing correlation values over spatial locations. Experiments demonstrate the effectiveness of the attentive model and offset-aware kernel, and the proposed model combining both techniques achieves the state-of-the-art performance.
Convolutional Neural Networks : Popular Architecturesananth
In this presentation we look at some of the popular architectures, such as ResNet, that have been successfully used for a variety of applications. Starting from the AlexNet and VGG that showed that the deep learning architectures can deliver unprecedented accuracies for Image classification and localization tasks, we review other recent architectures such as ResNet, GoogleNet (Inception) and the more recent SENet that have won ImageNet competitions.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Explores the type of structure learned by Convolutional Neural Networks, the applications where they're most valuable and a number of appropriate mental models for understanding deep learning.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
Enhanced Deep Residual Networks for Single Image Super-ResolutionNAVER Engineering
발표자: 김희원 (서울대학교 박사과정)
발표일: 2017.9.
(현)서울대학교 전기정보공학 석박통합과정 재학
Best Paper Award of NTIRE 2017 Workshop: Challenge Track
개요:
Single Image Super-Resolution은 저해상도 이미지를 고해상도의 원본 이미지로 복원시키는 연구 분야입니다. 실생활에서 접할 수 있는 흔한 예로는 SNS 사진 중 작은 부분을 크게 확대해도 선명하게 하는 것이나, thumb nail로 원본 이미지만큼의 해상도를 만들어 내는 것입니다.
이번 발표에서는 딥러닝 전과 후의 연구방향에 대해서 알아본 후, CVPR 2017의 2nd NTIRE Workshop Challenge에서 우승한 저희 팀의 연구를 신경망 구조에 대한 분석을 중심으로 살펴보려고 합니다.
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
해당 논문은 3D Aware 모델입니다 StyleGAN 같은 경우에는 어떤 하나의 피처에 대해서 Editing 하고 싶을 때 입력에 해당하는 레이턴트 백터를 찾아서 레이턴트 백터를 수정함으로써 입에 해당하는 피쳐를 바꿀 수 있었는데 이런 컨셉을 그대로 착안해서
GAN 스페이스 논문에서는 인풋이 들어왔을 때 어떤 공간적인 정보까지도 에디팅하려고 시도했습니다 결과를 봤을 때 로테이션 정보가 어느 정도 잘 학습된 것 같지만 같은 사람이 아닌 것 같이 인식되기도 합니다 이러한 문제를 이제 disentangle 되지 않았다라고 하는 게 원하는 피처만 변화시켜야 되는 것과 달리 다른 피처까지도 모두 학습 모두 변했다는 것인데 이를 좀 더 효율적으로 3D를 더 잘 이해시키기 위해서 탄생한 논문입니다.
From Experimentation to Production: The Future of WebGLFITC
Presented at FITC Toronto 2017
More info at http://fitc.ca/event/to17/
Hector Arellano, Firstborn
Morgan Villedieu, Firstborn
Overview
You don’t need an advanced degree in graphics engineering to use WebGL as a robust solution in your web design and development. During this talk you will discover how to harness the power of WebGL for real-world application.
Objective
Discover real-world applications for advanced WebGL techniques
Target Audience
Designers or developers excited to conquer the complexity associated with WebGL
Five Things Audience Members Will Learn
Explore the outer limits of physics effects, shaders and experimentation
Understand how these techniques can be applied to transform 3D to 2D shadows and post-processing
Render real-time liquid in WebGL
Use DOM as a texture so you get the power of WebGL without having to worry about a fallback system
Master the basics by utilizing libraries
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
a collection of terminologies used in the game development industry, from my point of view any one who intends to work in that business should understand them.
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Gurbinder Gill
In this presentation we propose the parallel implementation of template matching using Full Search using NCC as a measure using the concept of pre-computed sum-tables referred to as FNCC for high resolution images on NVIDIA’s Graphics Processing Units (GP-GPU’s)
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...Electronic Arts / DICE
Global illumination (GI) has been an ongoing quest in games. The perpetual tug-of-war between visual quality and performance often forces developers to take the latest and greatest from academia and tailor it to push the boundaries of what has been realized in a game product. Many elements need to align for success, including image quality, performance, scalability, interactivity, ease of use, as well as game-specific and production challenges.
First we will paint a picture of the current state of global illumination in games, addressing how the state of the union compares to the latest and greatest research. We will then explore various GI challenges that game teams face from the art, engineering, pipelines and production perspective. The games industry lacks an ideal solution, so the goal here is to raise awareness by being transparent about the real problems in the field. Finally, we will talk about the future. This will be a call to arms, with the objective of uniting game developers and researchers on the same quest to evolve global illumination in games from being mostly static, or sometimes perceptually real-time, to fully real-time.
This presentation was given at SIGGRAPH 2017 by Colin Barré-Brisebois (EA SEED) as part of the Open Problems in Real-Time Rendering course.
Past, Present and Future Challenges of Global Illumination in GamesColin Barré-Brisebois
Global illumination (GI) has been an ongoing quest in games. The perpetual tug-of-war between visual quality and performance often forces developers to take the latest and greatest from academia and tailor it to push the boundaries of what has been realized in a game product. Many elements need to align for success, including image quality, performance, scalability, interactivity, ease of use, as well as game-specific and production challenges.
First we will paint a picture of the current state of global illumination in games, addressing how the state of the union compares to the latest and greatest research. We will then explore various GI challenges that game teams face from the art, engineering, pipelines and production perspective. The games industry lacks an ideal solution, so the goal here is to raise awareness by being transparent about the real problems in the field. Finally, we will talk about the future. This will be a call to arms, with the objective of uniting game developers and researchers on the same quest to evolve global illumination in games from being mostly static, or sometimes perceptually real-time, to fully real-time.
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision GroupLihang Li
This is the slides about DTAM for my group meeting report, hope it does help to anyone who will want to implement DTAM and need to understand it deeply.
Similar to Color and 3D Semantic Reconstruction of Indoor Scenes from RGB-D stream (20)
비행기 설계를 왜 통일 해야 할까?
디자인 시스템을 하는 이유
비행기들이 다 용도가 다르다...어떻게 설계하지?
맥락이 다른 페이지와 패턴
경유지까지 아직 멀었다... 언제 수리하지?
디자인 시스템을 적용하는 시점
엔지니어랑 얘기해서 정비해야하는데...어떻게 수리하지?
디자인 시스템을 적용하는 프로세스
비행기 설계가 바뀐걸 어떻게 알리지?
디자인 시스템의 전파
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
Color and 3D Semantic Reconstruction of Indoor Scenes from RGB-D stream
1. Color and 3D Semantic Reconstruction
of Indoor Scenes from RGB-D Streams
전준호
CG Lab. POSTECH
Tech Talk @ NAVER
2018.12.10
2. 3D Reconstruction
• Capture shape and appearance of real objects and environments
• Produce 3D models for applications such as virtual/augmented reality, 3D printing
2
3. 3D Reconstruction using RGB-D Sensor
• Geometric reconstructions are rapidly developed, and available for large-scale scenes
▫ But mainly focus on acquiring an accurate geometry
KinectFusion [Newcombe 2011]
Voxel hashing [Nießner 2013] Elastic fragments [Zhou 2013] Robust reconstruction [Choi 2015]
3
4. Auxiliary Information of 3D Indoor Scene
• Surface color
• Object class
• Lighting condition
• Sound
Rich UX Color and Semantic Reconstruction
4
5. Contributions – Color Reconstruction
• Texture Map Generation for 3D Reconstructed Scenes
▫ Reconstruct clean and sharp surface color of the 3D reconstructed scene
▫ Light-weight color representation for reconstructed scenes
▫ Texture coordinates optimization to acquire sharp texture map
Texture map generation for 3D reconstruction
5
6. Contributions – Semantic Reconstruction
• Reconstruction of semantically segmented 3D meshes
▫ Predict per-vertex object class of the 3D reconstructed scene
▫ Volumetric semantic fusion of frame-by-frame semantic predictions
▫ Adaptive integration and CRF optimization for robust labeling
3D semantic reconstruction
6
7. Texture Map Generation
for 3D Reconstructed Scenes
Junho Jeon, Yeongyu Jung, Haejoon Kim,
Seungyong Lee
The Visual Computer (CGI 2016)
8. 3D Reconstruction using RGB-D Sensor
• Available for very large-scale scenes
▫ But no or inaccurate color information!
Robust reconstruction [Choi 2015] BundleFusion [Dai 2017] 8
9. Color Reconstruction
• Naïve color blending introduces blurring, ghosting, etc.
▫ Incorrect camera poses
▫ Lens distortions
▫ Misaligned RGB-D images
• Goal: precisely reconstruct the color from RGB-D stream
Blurry color from volumetric blending
9
10. Previous work: Color Map Optimization
• Zhou and Koltun, TOG 2014
▫ Project RGB stream onto mesh to get vertex color
▫ Optimize camera pose & warping function for images clean vertex color
▫ Limitation: method based on vertex color
Time-consuming optimization
Inefficient rendering
* Images from Zhou’s slides
Result
Optimization takes 5 mins.
Image warping function
10
11. Our Approach
• Color reconstruction based on texture mapping
▫ Generating texture map for simplified mesh
▫ Optimize texture map to maximize photometric consistency
▫ GPU-based parallel solver
100x faster color reconstruction!
Efficient rendering
Our method
11
14. Preprocessing
• Geometric model reconstruction
▫ Dense scene reconstruction with point of interest [Zhou 2013]
▫ Any other 3D reconstruction method can be used
• Model simplification
▫ Original mesh consists of more than 1M faces
Inefficient texture mapping
Further process becomes extremely time-consuming
▫ Surface simplification using quadric error metrics [Garland 1997]
Mesh simplification (faces 460K to 23K)Dense scene reconstruction [Zhou 2013]
14
15. Spatiotemporal Key Frame Sampling
Rendering Result
Sub-textures
Global texture
Spatio-temporally
sampled key frames
Simplified 3D
reconstructed mesh
RGB-D stream
Color
Depth
Refined global
texture map
(1) Preprocessing
(4) Texture map
optimization
(2) Key frame
sampling
(3) Texture map
generation
15
16. Spatiotemporal Key Frame Sampling
• Input color stream
▫ A lot of redundant data, color images suffer from motion blurs
• Temporal sampling
▫ Sample less blurry key frames based on Blurriness [Crété-Roffet 2007]
• Spatial sampling
▫ Uniqueness: the image not able to be covered by other image
▫ Sample by eliminating image with minimum uniqueness
Overlapping (red) and unique region (blue)Temporal sampling with blurriness
16
17. Texture Map Generation
Rendering ResultRGB-D stream
Color
Depth
Refined global
texture map
Sub-textures
Global texture
Spatio-temporally
sampled key frames
Simplified 3D
reconstructed mesh
(3) Texture map
generation
(1) Preprocessing
(4) Texture map
optimization
(2) Key frame
sampling
17
18. Texture Map Generation
• UV unwrapping to mesh for global texture map
▫ Get global texture coordinates for every vertex
• Estimate color by blending key frames
▫ Sub-texture map by projecting mesh to each camera
▫ Blended sub-texture becomes global texture
Global texture map
UV
unwrapping
Sub-textures
Mesh
projection
Weighted
blending
18
19. Global Texture Map Optimization
Rendering Result
Spatio-temporally
sampled key frames
Simplified 3D
reconstructed mesh
RGB-D stream
Color
Depth
(1) Preprocessing
(2) Key frame
sampling
Sub-textures
Global texture
Refined global
texture map
(4) Texture map
optimization
(3) Texture map
generation
19
20. Global Texture Map Optimization
• Generated texture map also suffers from blurring, ghosting, etc.
▫ Inconsistent color blending from different sub-textures
• Optimize sub-texture coordinates to be consistent
Sharper & cleaner global texture map
Inconsistent blendingConsistent blending
20
21. Global Texture Map Optimization
• Search new sub-texture coordinates of each vertex
• Energy formulation for photometric consistency
▫ For every face, blended global texture should be consistent with sub-textures
▫ Consider consistency of sampled points on each face
• Non-linear least square problem
▫ Need to be solved by Gauss-Newton method
Sub-texture coordinates (variables)
Sub-texture
(intensity)
Blended global texture
(intensity)
Sub-textures of face f
21
22. GPU-based Alternating Solver
• Applying naïve Gauss-Newton method is non-trivial
▫ Infeasible to solve directly due to the # of variables
• Exploit locality of the problem to parallelize the optimization
▫ Assuming 1-ring neighborhood of v fixed,
optimization of sub-texture coordinates uv is independent from other vertices
▫ Schwarz Alternating Method
While keeping boundary variables, update inner variables
Independent optimizations are propagated iteratively
v
v2v1
v3
v4
1-ring neighborhood
v
Propagation of optimization
22
23. Experimental Results
• Tested on various 3D reconstructed models
• Intel i7 4.0GHz, 16 RAM, NVIDIA Titan X
*models from Zhou et al. 23
36. Summary
• Texture map generation for color reconstruction of 3D indoor scene
▫ Texture map generation maximizing the photometric consistency of mapping
▫ Spatiotemporal sampling for faster processing and sharper texture map
▫ Efficient optimization based on a parallel Gauss-Newton solver on GPU
Directly applicable for computer graphics
36
37. Semantic Reconstruction:
Reconstruction of Semantically Segmented
3D Meshes via Volumetric Semantic Fusion
Junho Jeon, Jinwoong Jung, Jungeon Kim,
Seungyong Lee
Computer Graphics Forum (Pacific Graphics 2018)
38. Reconstruction of Semantic Information
• Virtual/augmented reality Interaction with 3D scenes
• Single connected 3D model not suitable
• Requires individually segmented object models
Semantic segmentation on 3D reconstructed scene
Interaction with 3D scene Single connected 3D model
Sofa
Floor
Shelves
Wall
38
39. Semantic Segmentation on 2D Image
• Pixel-wise annotation of semantic object class
• Well established network architectures and dataset
▫ PASCAL, MS COCO, Mapillary, Places, …
• Has shown successful performance
Places dataset Mapillary dataset 39
40. 3D Semantic Segmentation
• Point-wise (vertex-wise) annotation on 3D scene model
• Deep learning on a 3D data is not straight-forward
▫ Unstructured point cloud, mesh with complex topology
• Lack of annotated 3D reconstructed model dataset
▫ Recently, an annotated dataset is released (ScanNet)
floor
bed
wall
Chair
Picture
Reconstructed 3D model Per-vertex annotation
40
41. Related Work – 3D CNN-based Methods
• Represent input geometry as an uniform voxel grid
▫ Binary occupancy grid or distance field
• Direct feature extraction and classification w/ 3D CNN
• Higher memory consumption only low resolution segmentations
Fully convolutional 3D CNN architecture
Images courtesy of [Qi 2017]
Voxel-based semantic segmentation
[Dai 2017]
41
42. Related Work – Point-based Methods
• Unstructured point cloud to ordered sequence vector
▫ Point set grouping, slice pooling, max pooling
• Feature extraction and classification w/ MLP or RNN
• Miss geometric detail (may miss small object classes)
RSNet [Huang 2018]PointNet++ [Qi 2017]
42
43. Our Approach: Semantic Reconstruction
• 3D (geometry) reconstruction
using fusion of
multiple geometry (depth images)
• 3D semantic reconstruction
using fusion of
multiple 2D semantic predictions
Multiple depth images Dense surface reconstruction
Multiple semantic predictions 3D semantic reconstruction
43
44. Volumetric Fusion of Semantic Information
• Review: Volumetric Fusion of 3D Geometry
• Geometry representation using a uniform voxel grid
▫ Each voxel stores TSDF value (geometry information)
Uniform voxel grid
44
45. Volumetric Fusion of Semantic Information
• Review: Volumetric Fusion of 3D Geometry
• Geometry representation using a uniform voxel grid
▫ Each voxel stores TSDF value (geometry information)
• Merge noisy measurements on single voxel grid w/ estimated camera poses
▫ Volumetric denoising of the reconstructed geometry (TSDF values)
Images courtesy of Newcombe’s slides
Multi-frames geometric fusionUniform voxel grid
45
46. Volumetric Fusion of Semantic Information
• Each voxel has semantic probabilistic distribution (20 classes)
▫ Volumetric fusion of multi-frames semantic predictions
• Seamless integration into the 3D (geometry) reconstruction process
Volumetric semantic fusion
RGB-D Stream
Stream of
Semantic Prediction
CNN-based 2D semantic segmentation
46
51. CNN-based 2D Semantic Segmentation
• RGB-D Semantic segmentation RDFNet [Park 2017]
• Stream for 3D reconstruction differs from still images
▫ Captured close to objects, may suffer from motion blur
Images from ScanNet dataset (reconstruction)
Images from NYU-D dataset (still image)
51
52. CNN-based 2D Semantic Segmentation
• RGB-D Semantic segmentation RDFNet [Park 2017]
• Stream for 3D reconstruction differs from still images
▫ Captured close to objects, may suffer from motion blur
• Fine tuning on ScanNet dataset [Dai et al. 2017]
▫ Drastically improves segmentation quality
Input Original RDFNet
[Park 2017]
Fine-tuned RDFNet
52
53. Adaptive Volumetric Semantic Fusion
• 2D predictions & camera poses may have error
▫ Weighted average of class probability for a voxel
Volumetric semantic fusion
?
??
?
53
54. Adaptive Volumetric Semantic Fusion
• 2D predictions & camera poses may have error
▫ Weighted average of class probability for a voxel
• Depth-based reliability weight
▫ Network accuracy depends on a pixel depth (i.e. relative scale)
▫ Pixel close to the camera gives less contribution to the result
Volumetric semantic fusion
?
??
?
54
55. Adaptive Volumetric Semantic Fusion
• 2D predictions & camera poses may have error
▫ Weighted average of class probability for a voxel
• Foreground boundary weight
▫ Unreliable predictions around misaligned objects boundary
▫ Prohibit wall/floor labels for foreground objects (pixels)
Input color Input depth
Depth weights Foreground weights
Wall
(background)
Object
(foreground)
Unreliable predictions
55
56. Reconstruction of Semantically Labeled 3D Mesh
• Marching cube to extract a reconstructed 3D mesh from the volumetric representation
▫ Bilinear interpolation of fused probabilities at voxels
▫ Each vertex has 20 objects class probabilities
56
57. Reconstruction of Semantically Labeled 3D Mesh
• Marching cube to extract a reconstructed 3D mesh from the volumetric representation
▫ Bilinear interpolation of fused probabilities at voxels
▫ Each vertex has 20 objects class probabilities
• Select maximum probability class for each vertex to obtain an initial segmentation
Initial segmentation result
Color Floor Wall
OthersBedChair
Probability visualization for major classes
57
58. CRF-based Label Regularization
• Integrated, but noisy 3D segmentation results
▫ 2D Segmentation considers a limited FOV individually
• CRF optimization to determine final class labels
• Consider global context of reconstructed scene w/ geometry (surface normal),
appearance (colors), and semantic similarity using a confusion matrix of CNN
Input Naïve integration Adaptive integration
No CRF
Final result
58
59. Experimental Setting
• 2D semantic segmentation: RDFNet for RGB-D stream (Caffe) [Park 2017]
• Camera pose estimation & 3D volumetric fusion: BundleFusion [Dai 2017]
• NVIDIA GeForce Titan X with 12GB VRAM
59
62. Segmentation Result of Large-scale Scenes
• Incremental integration enables
semantic reconstruction
of large-scale 3D scenes
Reconstructed scene Segmented result
62
65. Quantitative Evaluation
• Global voxel classification accuracy with majority voting
▫ Improve previous method with a large gap (+6.86%)
▫ Tested on ScanNet dataset (312 test scenes)
Configurations Accuracy
Voxel-based labeling [Dai 2017] 73.0%
Naïve integration without CRF 79.02%
Adaptive integration without CRF 79.28%
Naïve integration with CRF 79.79%
Adaptive integration with CRF 79.86%
65
66. Quantitative Evaluation
• Global voxel classification accuracy with majority voting
▫ Improve previous method with a large gap (+6.86%)
▫ Tested on ScanNet dataset (312 test scenes)
• Adaptive integration & CRF seem not effective
▫ Mainly handles an object boundary: visually critical but covers only small portion of data
Configurations Accuracy
Voxel-based labeling [Dai 2017] 73.0%
Naïve integration without CRF 79.02%
Adaptive integration without CRF 79.28%
Naïve integration with CRF 79.79%
Adaptive integration with CRF 79.86%
66
68. 2D Projection of 3D Segmentation
• Fusion & regularization improve semantic segmentation results
• We can render 2D semantic maps from the segmented 3D model
• Original 2D segmentation vs. rendered 2D results
▫ Tested on ScanNet dataset (53K frames from 312 test scenes)
Pixel
Acc.
Mean
Acc.
Mean
IoU
Original RDFNet 60.44 47.32 29.34
Finetuned RDFNet (2D) 73.55 59.82 45.60
Our result (rendered 2D) 77.18 63.20 50.69
Quantitative comparisonInput image Results of CNN Our result 68
69. 3D Scene Completion and Manipulation
• Class-wise (semantic) 3D scene manipulation
• Scene completion
• Object modification
Input sceneSemantic meshFloor fillingObject removal
69
70. Summary
• Volumetric semantic fusion integrating 2D semantic predictions
exploit success of 2D CNN & data
• Adaptive integration based on depth and scene structure
compensate uncertainty of network prediction
• CRF-based label regularization using the geometric and photometric information
refine final result
70
71. Summary
• Volumetric semantic fusion integrating 2D semantic predictions
exploit success of 2D CNN & data
• Adaptive integration based on depth and scene structure
compensate uncertainty of network prediction
• CRF-based label regularization using the geometric and photometric information
refine final result
• Limitation
▫ 2D semantic segmentation requires heavy computation
▫ Multiple GPUs to achieve real-time performance
71
72. Summary and Future Work
Color and 3D Semantic Reconstruction
of Indoor Scenes from RGB-D Streams
73. Summary
• 3D Reconstruction of auxiliary information
▫ Beyond the geometric reconstruction of the indoor scene
▫ Useful for rich user experience on VR/AR application
73
74. Summary
• 3D Color and Semantic Reconstruction of Indoor Scenes from a RGB-D Streams
Efficient and accurate color representation
Texture map generation using spatiotemporal key frame sampling and texture coordinate optimization
Optimizing texture map considering geometric and photometric consistency together
Per-vertex dense semantic class information
3D Semantic segmentation on a reconstructed scenes via a volumetric semantic fusion
3D instance segmentation of reconstructed scene for individual object meshes
Texture map generation Semantic reconstruction
74