SlideShare a Scribd company logo
1 of 47
Download to read offline
• Education
• Korea University of Technology and Education (Mar. 2010 ~ Feb. 2017)
• Bachelor of Science in Computer Engineering
• Tokyo Institute of Technology (Apr. 2017 ~ )
• Master of Science in Computer Science candidate
• Koike laboratory (Vision-based human-computer interaction)
• Research assistant, Team Koike, CREST, JST
• Publications
[1] MonoEye: Monocular Fisheye Camera based 3D Human Pose Estimation, IEEE VR 2019 (poster abstract, accepted)
Dong-Hyun Hwang, Kohei Aso, and Hideki Koike
[2] ParaPara: Synthesizing Pseudo-2.5D Content from Monocular Videos for Mixed Reality, ACM CHI 2018 Extended Abstract
Dong-Hyun Hwang and Hideki Koike
[3] MlioLight: Projector-camera Based Multi-layered Image Overlay System for Multiple Flashlights Interaction, ACM ISS 2018
Toshiki Sato, Dong-Hyun Hwang, and Hideki Koike
[4] AR based Self-sports Learning System using Decayed Dynamic Time Warping Algorithm, Eurographics ICAT-EGVE 2018
Atsuki Ikeda, Dong-Hyun Hwang, and Hideki Koike
• Research Interest
• AR, MR based interactive content synthesis
• Egocentric vision
• Computer vision-based interactive system
Dong-Hyun Hwang (황동현)
hwang.d.ab@m.titech.ac.jp
• Introduction
• Reviewing Free-viewpoint video synthesis systems
• FVV Synthesis with Multiple Cameras
• FVV Synthesis with a Monocular Video
• ParaPara System
• System Implementation
• Applications
• Discussion
• Interactive Demo
• Conclusions
Contents
• Free-viewpoint Video (FVV) is one of the advanced media that
provides immersive user experience and interaction.
• Providing flexible viewpoint navigation in 3D space.
• From limited linear-viewpoint selection to intuitive free-viewpoint
selection.
• Two ways to synthesize content.
• With multiple imaging equipment.
• With a monocular camera / video.
Free-viewpoint Video
• Reconstruct 3D information with multiple images.
• Structure from Motion (SfM), point clouds, image-based visual hulls, etc.
• Synthesized content is accurate and impressive.
• Accurate synchronization method and complex configuration are required.
FVV Synthesis with Multiple Cameras
Photo Tourism (SfM based, 2005) Hardware configuration of Goorts et al.’s system (2013)
• Virtualized Reality and EyeVision (Kanade et al. 1995 and 2001)
• Capturing a target using 51 cameras with dome structure.
• Synthesize an image from a virtual viewpoint.
• EyeVision based on this technology was used in Super Bowl XXXV.
• Image-based Visual Hulls (Matusik et al. 2000)
• Inverse projection (silhouette cone) from foreground silhouettes with
camera parameters.
• Visual hull is intersection between silhouette cones.
FVV Synthesis with Multiple Cameras (cont’d)
EyeVisionVirtualized Reality IBVH
• High-Quality Streamable Free-Viewpoint Video (Collet et al. 2015)
• Capturing a target with 106 RGB and infrared cameras.
• Construct a 3D mesh model from captured point clouds.
• Encoding the content through streamable MPEG-DASH.
• Holoportation (Orts et al. 2016)
• Reducing the number of cameras (From 106 to 24).
• Real-time 3D reconstruction (34 fps).
FVV Synthesis with Multiple Cameras (cont’d)
Collet et al.’s system (2015) Holoportation (2016)
• Reconstruct 3D information from a single monocular video or camera.
• A typical under-constrained problem with inherent ambiguity.
• Non-requiring special capturing equipment or environments.
• Reusing produced content.
FVV Synthesis with a Monocular Video
Input and output results of ParaPara
• Tour Into the Picture (Horry et al. 1997)
• Adding virtual vanishing point with user interaction.
• Modeling 3D model.
• Synthesizing a virtual viewpoint’s view using homography transform.
FVV Synthesis with a Monocular Video (cont’d)
Original image and synthesized images from novel viewpoint.
http://andyzeng.github.io/homography
Modeling procedure
• Soccer on your Tabletop (Rematis et al. 2018)
• Deep neural networks based system.
• Detecting players and tracking them.
• Reconstructing player’s depth map and mesh.
• Estimating a camera pose from landmarks of the soccer field.
• Generative Query Network (Eslami et al. 2018)
• Representation network produces a vector which describes observations.
• Generation network predicts the scene from an unobserved viewpoint.
FVV Synthesis with a Monocular Video (cont’d)
Procedures of Soccer on Your Tabletop system Input and output results of GQN
• User-generated Content (UGC) is content created from the user side.
• It has been risen from Web 2.0.
• Simple photo sharing to 360-degree VR content.
• The impact of UGC is the delivery of a tremendous amount of content.
User-generated Content
• Most of FVV synthesis systems have problems:
• Requiring multiple imaging equipment
• Non-end-user friendly system configuration
• Unable to use existing content.
• The goal of our research is to develop an end-to-end system,
which synthesizes pseudo-2.5D free-viewpoint content from
monocular videos for creating and disseminating UGC.
Problem Definition and Research Objective
ParaPara
Monocular Video Pseudo-2.5D
FVV content
ParaPara System
https://youtu.be/F2H1L2Pqj0c
ParaPara System Configuration
Scene Synthesizer
• Hybrid module (DNN and common image processing algorithms)
to synthesize FVV scene from monocular videos.
Scene Synthesizer
• OpenPose (DNN based) is used to detect persons in a video sequence.
• Two modes (normal mode, precision mode) are provided based on
network input resolution.
• To compensate detection failure, a detected person is tracked.
• A bounding box is refined based on detected joints.
Person Detecting and Tracking
• With minimal user interaction,
homography matrix (image world to
real world) is calculated.
• The pseudo position is calculated with
the homography matrix based on
ankle positions of detected person.
Pseudo-3D Position Estimation
• Persons’ textures are extracted by KNN based background subtraction
and refined bounding boxes.
• 1-D mean filter makes contours more smoothly.
• Texture size correction method minimizes distortion caused
by the perspective error.
Texture Extraction and Size Correction
• Each texture is placed in the 3D-world based on the calculated
pseudo-3D position.
• The extracted background image or a custom image is set to the
ground texture.
Scene Synthesis
Content Player
• Playing synthesized content from scene synthesizer.
• Working on a mixed reality head mounted device.
• Provide interaction between the user and the content.
Content Player
• Generated content is displayed in real world with 30 fps.
• Spatial mapping allows content to be freely placed on real-world
objects.
• Head Related Transfer Function (HRTF) based spatial sound method
synthesizes a directional sound according to the position of the
content.
Playing Synthesized Content
• The texture is extracted from a monocular video and doesn’t include
the information not captured in the original video.
• Axial billboard rendering (y-axis) applied to ParaPara minimizes
unnaturalness of 2D textures of the generated content caused by
changing a viewpoint.
Billboard Rendering
w/o billboard rendering w/ billboard rendering
Private
• Accuracy of Depth Estimation
• Accuracy of Texture Extraction
• Processing Speed
Evaluation Metrics
Private
• Ground truth is generated based on short-range (5 participants) and
long-range (virtual environment) conditions.
• Calculate average z-axis error.
• Evaluation function: mean absolute error.
Accuracy of Depth Estimation
Short range condition Long range condition
Private
• Average z-axis error (short range) : 24.57cm
• Average z-axis error (long range) : 76.04cm
• The error increased as the model moved away from the camera.
• As the distance of the ground truth image increases, the quantization
error is also increased.
Accuracy of Depth Estimation
Private
• Compare proposed method (background subtraction based) with
mask R-CNN (SOTA, DNN based) as ground truth.
• Mask IoU = 0.72 (sd=0.05).
• IoU score > 0.5 is normally considered a “good” prediction.
Accuracy of Texture Extraction
(a) Mask R-CNN (blue region), (b) ours (red region)
(c) overlapped two methods (purple region is the intersection area).
Private
• Average processing speed is approximately 180 ms
(450 ms in precision mode).
• The proposed texture extraction method is faster than mask R-CNN
(avg. processing time with mask R-CNN: 2683ms).
• By combining DNN and common image processing algorithms, the
processing speed is higher than fully DNN-based systems.
Processing Speed
179.36
448.47
0
50
100
150
200
250
300
350
400
450
500
normal mode precision mode
(ms)
Average processing time per frame
41.43
2545
0
500
1000
1500
2000
2500
3000
proposed method mask R-CNN
(ms)
Processing time of texture extraction procedures
Private
Private
• Evaluate the effectiveness of the content synthesized by proposed
system and original monocular videos.
• Three comparison conditions (C1, C2, C3)
• Twelve participants (three females, !" = 26, &' = 9.23)
• 5-point Likert-based questionnaire was used.
• 1=Strongly Disagree to 5=Strongly Agree
Experiment Design
Private
• Users can visually recognize
the spatial information
through the proposed
method C3 more easily than
C1 and C2 (p ≤ 0.001).
• Users could feel the
stereoscopic effect on the
content created by
the proposed system
compared with the existing
monocular video.
Visual Depth Perception (Stereoscopy)
Monocular Video + 2D display
Monocular Video + MR HMD
Synthesized Content + MR HMD (w/ red box)
Private
• A significant difference was
observed between C3
(proposed method) and other
conditions (p ≤ 0.001).
• Content generated by the
proposed system affects the
user’s immersion.
Immersive Degree (Immersion)
Monocular Video + 2D display
Monocular Video + MR HMD
Synthesized Content + MR HMD (w/ red box)
Private
• A significant difference was
observed between C3
(proposed method) and
other conditions (p ≤ 0.001).
• C3 provided the most
interesting experience
among the methods.
• Stereoscopic experience
made good subject
responses.
Attractiveness
Monocular Video + 2D display
Monocular Video + MR HMD
Synthesized Content + MR HMD (w/ red box)
Private
• The user can easily covert a monocular sports video into
immersive sports content.
Immersive Sports Broadcasting
Input video
(ISSIA-CNR dataset)
Synthesized content
• Various entertainment videos on the Internet can be converted
into attractive FVV content with ParaPara.
Dynamic Entertainment Content
Input video Synthesized content
• ParaPara can synthesize multiple videos into a single scene.
• The user can intuitively perceive spatial information and track the target
moves from the camera viewpoint to the viewpoint of another cameras.
Effective Surveillance System
Synthesized content from CMUSRD dataset
• High versatility and low cost
• Creating FVV content from monocular videos.
• High usability
• End-user without expertise can use the system.
• Reasonable quality
• Monocular videos can be converted into immersive content.
• Fast processing speed
• The system is faster than fully DNN based systems.
Advantages of ParaPara
• Limited camera posture
• Only fixed viewpoint videos are supported.
• Pseudo-3D position
• !-axis position cannot be estimated.
• Texture artifacts
• Detection failure causes artifacts.
• 2D texture
• The lost information not faced to the camera cannot be restored.
Technical Challenges and Limitation
• Applying deep neural networks to a wider range of procedures.
• Depth estimation.
• Camera pose estimation.
• Recovering lost information.
• Converting a detected person’s silhouette into a fitted 3D model.
• Using generative models (GAN, autoencoder, etc.) for texture recovery.
Future Work
Pipeline of Photo Wake-Up (2018) Warping-GAN (2018)
Contributions
• ParaPara, an alternative system to synthesize FVV content from
single or multiple monocular videos,
• performance evaluations for the proposed system,
• a user study to assess the usability of the synthesized content,
• suitable sample applications in which the proposed system is
capable of.
Summary
• Requiring multiple imaging equipment.
• Non-end-user friendly system
configuration.
• Unable to use existing content.
• Creating FVV content without multiple imaging
equipment.
• Increasing system’s usability.
• Utilizing current content and equipment.
Research Goals
Part of this work has been presented to ACM CHI 2018 EA.
ParaPara: Synthesizing Pseudo-2.5D Content from Monocular Videos for Mixed Reality.
Dong-Hyun Hwang and Hideki Koike.
Problems
This work was supported in part by a grant from
JST CREST Grant Number JPMJCR17A3, Japan.
“A study on skill acquisition mechanism and development of skill transfer systems.”

More Related Content

What's hot

Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Greeshma M.S.R
 
【ECCV 2016 BNMW】Human Action Recognition without Human
【ECCV 2016 BNMW】Human Action Recognition without Human【ECCV 2016 BNMW】Human Action Recognition without Human
【ECCV 2016 BNMW】Human Action Recognition without Human
Hirokatsu Kataoka
 
Super Resolution of Image
Super Resolution of ImageSuper Resolution of Image
Super Resolution of Image
Satheesh K
 

What's hot (20)

Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
 
AR/SLAM for end-users
AR/SLAM for end-usersAR/SLAM for end-users
AR/SLAM for end-users
 
Image super resolution based on
Image super resolution based onImage super resolution based on
Image super resolution based on
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
 
更適應性的AOI-深度強化學習之應用
更適應性的AOI-深度強化學習之應用更適應性的AOI-深度強化學習之應用
更適應性的AOI-深度強化學習之應用
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
【ECCV 2016 BNMW】Human Action Recognition without Human
【ECCV 2016 BNMW】Human Action Recognition without Human【ECCV 2016 BNMW】Human Action Recognition without Human
【ECCV 2016 BNMW】Human Action Recognition without Human
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation
 
Super Resolution of Image
Super Resolution of ImageSuper Resolution of Image
Super Resolution of Image
 
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
 
Adaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom predictionAdaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom prediction
 

Similar to Synthesizing pseudo 2.5 d content from monocular videos for mixed reality

Similar to Synthesizing pseudo 2.5 d content from monocular videos for mixed reality (20)

High Quality Video Simulation from Still Images
High Quality Video Simulation from Still ImagesHigh Quality Video Simulation from Still Images
High Quality Video Simulation from Still Images
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
Cartoonization of images using machine Learning
Cartoonization of images using machine LearningCartoonization of images using machine Learning
Cartoonization of images using machine Learning
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
 
Luigy Bertaglia Bortolo - Poster Final
Luigy Bertaglia Bortolo - Poster FinalLuigy Bertaglia Bortolo - Poster Final
Luigy Bertaglia Bortolo - Poster Final
 
Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...
 
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro..."High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
 
Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...
 
DreamPose: Fashion Image to Video Synthesis via Stable Diffusion
DreamPose: Fashion Image to Video Synthesis via Stable DiffusionDreamPose: Fashion Image to Video Synthesis via Stable Diffusion
DreamPose: Fashion Image to Video Synthesis via Stable Diffusion
 
NMSL_2017summer
NMSL_2017summerNMSL_2017summer
NMSL_2017summer
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
Detection of a user-defined object in an image using feature extraction- Trai...
Detection of a user-defined object in an image using feature extraction- Trai...Detection of a user-defined object in an image using feature extraction- Trai...
Detection of a user-defined object in an image using feature extraction- Trai...
 
40120140502005
4012014050200540120140502005
40120140502005
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
 
A Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated ContentsA Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated Contents
 
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
 

More from NAVER Engineering

More from NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Recently uploaded

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Recently uploaded (20)

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 

Synthesizing pseudo 2.5 d content from monocular videos for mixed reality

  • 1.
  • 2. • Education • Korea University of Technology and Education (Mar. 2010 ~ Feb. 2017) • Bachelor of Science in Computer Engineering • Tokyo Institute of Technology (Apr. 2017 ~ ) • Master of Science in Computer Science candidate • Koike laboratory (Vision-based human-computer interaction) • Research assistant, Team Koike, CREST, JST • Publications [1] MonoEye: Monocular Fisheye Camera based 3D Human Pose Estimation, IEEE VR 2019 (poster abstract, accepted) Dong-Hyun Hwang, Kohei Aso, and Hideki Koike [2] ParaPara: Synthesizing Pseudo-2.5D Content from Monocular Videos for Mixed Reality, ACM CHI 2018 Extended Abstract Dong-Hyun Hwang and Hideki Koike [3] MlioLight: Projector-camera Based Multi-layered Image Overlay System for Multiple Flashlights Interaction, ACM ISS 2018 Toshiki Sato, Dong-Hyun Hwang, and Hideki Koike [4] AR based Self-sports Learning System using Decayed Dynamic Time Warping Algorithm, Eurographics ICAT-EGVE 2018 Atsuki Ikeda, Dong-Hyun Hwang, and Hideki Koike • Research Interest • AR, MR based interactive content synthesis • Egocentric vision • Computer vision-based interactive system Dong-Hyun Hwang (황동현) hwang.d.ab@m.titech.ac.jp
  • 3. • Introduction • Reviewing Free-viewpoint video synthesis systems • FVV Synthesis with Multiple Cameras • FVV Synthesis with a Monocular Video • ParaPara System • System Implementation • Applications • Discussion • Interactive Demo • Conclusions Contents
  • 4.
  • 5. • Free-viewpoint Video (FVV) is one of the advanced media that provides immersive user experience and interaction. • Providing flexible viewpoint navigation in 3D space. • From limited linear-viewpoint selection to intuitive free-viewpoint selection. • Two ways to synthesize content. • With multiple imaging equipment. • With a monocular camera / video. Free-viewpoint Video
  • 6. • Reconstruct 3D information with multiple images. • Structure from Motion (SfM), point clouds, image-based visual hulls, etc. • Synthesized content is accurate and impressive. • Accurate synchronization method and complex configuration are required. FVV Synthesis with Multiple Cameras Photo Tourism (SfM based, 2005) Hardware configuration of Goorts et al.’s system (2013)
  • 7. • Virtualized Reality and EyeVision (Kanade et al. 1995 and 2001) • Capturing a target using 51 cameras with dome structure. • Synthesize an image from a virtual viewpoint. • EyeVision based on this technology was used in Super Bowl XXXV. • Image-based Visual Hulls (Matusik et al. 2000) • Inverse projection (silhouette cone) from foreground silhouettes with camera parameters. • Visual hull is intersection between silhouette cones. FVV Synthesis with Multiple Cameras (cont’d) EyeVisionVirtualized Reality IBVH
  • 8. • High-Quality Streamable Free-Viewpoint Video (Collet et al. 2015) • Capturing a target with 106 RGB and infrared cameras. • Construct a 3D mesh model from captured point clouds. • Encoding the content through streamable MPEG-DASH. • Holoportation (Orts et al. 2016) • Reducing the number of cameras (From 106 to 24). • Real-time 3D reconstruction (34 fps). FVV Synthesis with Multiple Cameras (cont’d) Collet et al.’s system (2015) Holoportation (2016)
  • 9. • Reconstruct 3D information from a single monocular video or camera. • A typical under-constrained problem with inherent ambiguity. • Non-requiring special capturing equipment or environments. • Reusing produced content. FVV Synthesis with a Monocular Video Input and output results of ParaPara
  • 10. • Tour Into the Picture (Horry et al. 1997) • Adding virtual vanishing point with user interaction. • Modeling 3D model. • Synthesizing a virtual viewpoint’s view using homography transform. FVV Synthesis with a Monocular Video (cont’d) Original image and synthesized images from novel viewpoint. http://andyzeng.github.io/homography Modeling procedure
  • 11. • Soccer on your Tabletop (Rematis et al. 2018) • Deep neural networks based system. • Detecting players and tracking them. • Reconstructing player’s depth map and mesh. • Estimating a camera pose from landmarks of the soccer field. • Generative Query Network (Eslami et al. 2018) • Representation network produces a vector which describes observations. • Generation network predicts the scene from an unobserved viewpoint. FVV Synthesis with a Monocular Video (cont’d) Procedures of Soccer on Your Tabletop system Input and output results of GQN
  • 12.
  • 13. • User-generated Content (UGC) is content created from the user side. • It has been risen from Web 2.0. • Simple photo sharing to 360-degree VR content. • The impact of UGC is the delivery of a tremendous amount of content. User-generated Content
  • 14. • Most of FVV synthesis systems have problems: • Requiring multiple imaging equipment • Non-end-user friendly system configuration • Unable to use existing content. • The goal of our research is to develop an end-to-end system, which synthesizes pseudo-2.5D free-viewpoint content from monocular videos for creating and disseminating UGC. Problem Definition and Research Objective ParaPara Monocular Video Pseudo-2.5D FVV content
  • 18. • Hybrid module (DNN and common image processing algorithms) to synthesize FVV scene from monocular videos. Scene Synthesizer
  • 19. • OpenPose (DNN based) is used to detect persons in a video sequence. • Two modes (normal mode, precision mode) are provided based on network input resolution. • To compensate detection failure, a detected person is tracked. • A bounding box is refined based on detected joints. Person Detecting and Tracking
  • 20. • With minimal user interaction, homography matrix (image world to real world) is calculated. • The pseudo position is calculated with the homography matrix based on ankle positions of detected person. Pseudo-3D Position Estimation
  • 21. • Persons’ textures are extracted by KNN based background subtraction and refined bounding boxes. • 1-D mean filter makes contours more smoothly. • Texture size correction method minimizes distortion caused by the perspective error. Texture Extraction and Size Correction
  • 22. • Each texture is placed in the 3D-world based on the calculated pseudo-3D position. • The extracted background image or a custom image is set to the ground texture. Scene Synthesis
  • 24. • Playing synthesized content from scene synthesizer. • Working on a mixed reality head mounted device. • Provide interaction between the user and the content. Content Player
  • 25. • Generated content is displayed in real world with 30 fps. • Spatial mapping allows content to be freely placed on real-world objects. • Head Related Transfer Function (HRTF) based spatial sound method synthesizes a directional sound according to the position of the content. Playing Synthesized Content
  • 26. • The texture is extracted from a monocular video and doesn’t include the information not captured in the original video. • Axial billboard rendering (y-axis) applied to ParaPara minimizes unnaturalness of 2D textures of the generated content caused by changing a viewpoint. Billboard Rendering w/o billboard rendering w/ billboard rendering
  • 28. • Accuracy of Depth Estimation • Accuracy of Texture Extraction • Processing Speed Evaluation Metrics Private
  • 29. • Ground truth is generated based on short-range (5 participants) and long-range (virtual environment) conditions. • Calculate average z-axis error. • Evaluation function: mean absolute error. Accuracy of Depth Estimation Short range condition Long range condition Private
  • 30. • Average z-axis error (short range) : 24.57cm • Average z-axis error (long range) : 76.04cm • The error increased as the model moved away from the camera. • As the distance of the ground truth image increases, the quantization error is also increased. Accuracy of Depth Estimation Private
  • 31. • Compare proposed method (background subtraction based) with mask R-CNN (SOTA, DNN based) as ground truth. • Mask IoU = 0.72 (sd=0.05). • IoU score > 0.5 is normally considered a “good” prediction. Accuracy of Texture Extraction (a) Mask R-CNN (blue region), (b) ours (red region) (c) overlapped two methods (purple region is the intersection area). Private
  • 32. • Average processing speed is approximately 180 ms (450 ms in precision mode). • The proposed texture extraction method is faster than mask R-CNN (avg. processing time with mask R-CNN: 2683ms). • By combining DNN and common image processing algorithms, the processing speed is higher than fully DNN-based systems. Processing Speed 179.36 448.47 0 50 100 150 200 250 300 350 400 450 500 normal mode precision mode (ms) Average processing time per frame 41.43 2545 0 500 1000 1500 2000 2500 3000 proposed method mask R-CNN (ms) Processing time of texture extraction procedures Private
  • 34. • Evaluate the effectiveness of the content synthesized by proposed system and original monocular videos. • Three comparison conditions (C1, C2, C3) • Twelve participants (three females, !" = 26, &' = 9.23) • 5-point Likert-based questionnaire was used. • 1=Strongly Disagree to 5=Strongly Agree Experiment Design Private
  • 35. • Users can visually recognize the spatial information through the proposed method C3 more easily than C1 and C2 (p ≤ 0.001). • Users could feel the stereoscopic effect on the content created by the proposed system compared with the existing monocular video. Visual Depth Perception (Stereoscopy) Monocular Video + 2D display Monocular Video + MR HMD Synthesized Content + MR HMD (w/ red box) Private
  • 36. • A significant difference was observed between C3 (proposed method) and other conditions (p ≤ 0.001). • Content generated by the proposed system affects the user’s immersion. Immersive Degree (Immersion) Monocular Video + 2D display Monocular Video + MR HMD Synthesized Content + MR HMD (w/ red box) Private
  • 37. • A significant difference was observed between C3 (proposed method) and other conditions (p ≤ 0.001). • C3 provided the most interesting experience among the methods. • Stereoscopic experience made good subject responses. Attractiveness Monocular Video + 2D display Monocular Video + MR HMD Synthesized Content + MR HMD (w/ red box) Private
  • 38.
  • 39. • The user can easily covert a monocular sports video into immersive sports content. Immersive Sports Broadcasting Input video (ISSIA-CNR dataset) Synthesized content
  • 40. • Various entertainment videos on the Internet can be converted into attractive FVV content with ParaPara. Dynamic Entertainment Content Input video Synthesized content
  • 41. • ParaPara can synthesize multiple videos into a single scene. • The user can intuitively perceive spatial information and track the target moves from the camera viewpoint to the viewpoint of another cameras. Effective Surveillance System Synthesized content from CMUSRD dataset
  • 42.
  • 43. • High versatility and low cost • Creating FVV content from monocular videos. • High usability • End-user without expertise can use the system. • Reasonable quality • Monocular videos can be converted into immersive content. • Fast processing speed • The system is faster than fully DNN based systems. Advantages of ParaPara
  • 44. • Limited camera posture • Only fixed viewpoint videos are supported. • Pseudo-3D position • !-axis position cannot be estimated. • Texture artifacts • Detection failure causes artifacts. • 2D texture • The lost information not faced to the camera cannot be restored. Technical Challenges and Limitation
  • 45. • Applying deep neural networks to a wider range of procedures. • Depth estimation. • Camera pose estimation. • Recovering lost information. • Converting a detected person’s silhouette into a fitted 3D model. • Using generative models (GAN, autoencoder, etc.) for texture recovery. Future Work Pipeline of Photo Wake-Up (2018) Warping-GAN (2018)
  • 46.
  • 47. Contributions • ParaPara, an alternative system to synthesize FVV content from single or multiple monocular videos, • performance evaluations for the proposed system, • a user study to assess the usability of the synthesized content, • suitable sample applications in which the proposed system is capable of. Summary • Requiring multiple imaging equipment. • Non-end-user friendly system configuration. • Unable to use existing content. • Creating FVV content without multiple imaging equipment. • Increasing system’s usability. • Utilizing current content and equipment. Research Goals Part of this work has been presented to ACM CHI 2018 EA. ParaPara: Synthesizing Pseudo-2.5D Content from Monocular Videos for Mixed Reality. Dong-Hyun Hwang and Hideki Koike. Problems This work was supported in part by a grant from JST CREST Grant Number JPMJCR17A3, Japan. “A study on skill acquisition mechanism and development of skill transfer systems.”