SlideShare a Scribd company logo
1 of 5
Download to read offline
HoloSwap: Object Removal and Replacement using Microsoft Hololens
Yun Suk Chang∗
Deepashree Gurumurthy†
Hannah Wolfe‡
ABSTRACT
Object removal and replacement in augmented reality has applica-
tions in interior design, marker hiding and many other fields. This
project explores the viability of object removal and replacement
for the Microsoft HoloLens. We implemented gesture based ob-
ject selection, video inpainting, tracking and 3 different ways to
display our results. We placed (1) the display plane and a white
Stanford bunny[24] at the inpainted depth, (2) a solid display plane
with video inpainting covering the field of view, and (3) a vignetted
view of the video inpainting. We found placing the display plane
and a white Stanford bunny at the inpainted depth to have the most
reasonable results. Due to the semi-transparency of the HoloLens
replacing the object with a white mesh and having the object on a
light background is important. We also tested 3 different inpainting
masks for the updating frame. Our results show that the HoloLens
can be a reasonable solution in certain cases for object replacement
in augmented reality.
1 INTRODUCTION
With increase in research development and popularity, the number
of augmented reality (AR) application increased in many different
areas. One of the popular AR application is object removal and re-
placement. The object removal and replacement application ranges
from interior design [23] to replacing a tracking marker in maker
based AR [12], using diminished reality techniques. There has been
much work in tablet or other mobile device based diminished real-
ity application [8, 12, 14]. However, little is known for diminished
reality on see-through, head-mounted displays.
There are many pros to export diminish reality applications onto
see-through, head-mounted display devices. First, the user is able
to see the replaced or removed object with correct projection from
their view rather than from camera’s projection. Second, the user
gets better experience since the actual object is hidden from their
view at all times while they are wearing the display, preventing
from breaking out of the immersion. Lastly, in see-through dis-
plays, only the diminished portion of the real world view is affected
by display and camera distortions unlike the view from the camera
where whole view is affected.
In this work, we perform diminished reality on Microsoft
HoloLens using inpainting technique to test the viability of object
replacement in see-through, head-mounted display. We test the via-
bility in three different ways: (1) placing an inpainted display frame
and replacement object on the object selected for replacement to
test the object replacement, (2) running video-based inpainting al-
gorithm in view to test object removal in changing viewpoints, and
(3) blending an inpaint display frame into the scene to test the limits
of minimal seam achieved.
Through our evaluation, we discovered the limitations of di-
minished reality on see-through, head-mounted displays, includ-
ing problem of eye separation and dominance, limitation of field
of view, and inability to completely block out real world scene.
∗e-mail: ychang@cs.ucsb.edu
†e-mail: deepashree@umail.ucsb.edu
‡e-mail: wolfe@cs.ucsb.edu
However, our results show that in certain environments, see-though
display device can achieve reasonable object replacement and re-
moval.
2 RELATED WORK
2.1 Object Selection in Augmented Reality
There has been much work done in selecting object in 3D space
using augmented reality. For example, Olwal et al. [19] used track-
ing glove to point and cast a primitive object to the scene to select
an area of interest, which is used for object selection by statistical
analysis. On the other hand, Oda et al. [18] uses tracked Wii remote
to select the object by moving an adjustable sphere into the object
of interest. More recently, Miksik et al. [15] presented the Seman-
tic Paintbrush for selecting the real-world objects via a ray-casting
technique and semantic segmentation. Our drawing method can be
considered a pointing or ray-casting technique, almost like using a
precise can of spray-paint [3, 15, 17].
2.2 Inpainting in Augmented Reality
Enomoto et al.[8] implemented inpainting of objects using multi-
ple cameras. Later predetermined object removal for augmented
reality was implemented by Herling et al. [10]. Researchers have
implemented inpainting to cover markers in augmented reality
applications[12, 14]. In their papers, they do not remove objects,
but just cover predetermined 2D markers with textures. We could
not find any examples of video inpainting in an augmented reality
head-mounted display.
2.3 Origins of Inpainting
Inpainting was first done digitally in 1995 to fix scratches in film
[13]. Harrison et al.proposed a non-hierarchical procedure for re-
synthesis of complex textures [9]. An output image was generated
by adding pixels selected from input image. This method produced
good results but took a long time for implementation.
Other researchers tried to accelerate Harrison’s algorithm, and
while their results were potentially faster they caused more artifacts
[7, 20]. Many of these papers tried to combine, texture synthesis,
replicating textures ad infinitum for large regions using exemplar-
based techniques, and inpainting, filling small gaps in images by
propagating linear structures via diffusion. ”Fragment-based im-
age completion” would leave blurry patches [7]. ”Structure and
texture filling-in of missing image blocks in wireless transmission
and compression applications,” would inpaint some sections and
use texture synthesis in others causing inconsistencies [20]. Cri-
minisi et al.was the first group to combine texture synthesis and
inpainting together effectively and efficiently. They did this by in-
painting based on a priority function instead of raster-scan order
[5].
The PatchMatch algorithm proposed by Barnes et al.is the ba-
sis for many modern inpainting algorithms. Their algorithm finds
similar patches and copies data from surrounding patches to inpaint
areas [1].
2.4 State of the Art Inpainting
Inpainting has been performed on video sequences in the recent
years. ”Video Inpainting of Complex Scenes” [16] produces very
good results of inpainting. The inpainting algorithm stores infor-
mation from previous frames and thereby can also recreate miss-
ing text. ”Real time video inpainting using PixMix” [11] also ac-
counted for illumination changes between frames while inpainting.
”Exemplar-Based Image Inpainting Using a Modified Priority
Definition” [6] produced high quality output images by propagat-
ing patches based on their priority definition. Lately, novel inpaint-
ing algorithms have been included in pipeline for modular real time
diminished reality for indoor applications [22]. This however only
proposed a method which could be used in augmented reality de-
vices and did not use any such device for their implementation.
3 APPROACH
There are many parts of the pipeline to build an interactive applica-
tion to select, remove and replace objects in augmented reality. The
desired object to be removed or replaced is first selected by inter-
action with HoloLens. For object removal, we use a method of in-
painting using PatchMatch [1]. The first frame is captured through
HoloLens and the output of the first frame is an inpainted area at
the location of the selected object. We used different approaches to
inpaint the current frame. The first method was by using informa-
tion from entire frame, the second method only used information
surrounding the selected object and the last method used informa-
tion from previous inpainted frame and small area around selected
object in current frame. In order to achieve effective inpainting for
consecutive frames at the location of the object in world space, we
incorporated object tracking into our algorithm. This would help
inpainting the current location of object in every frame, thereby
accounting for slight changes in position of the camera. Object
replacement is implemented by placing a plane in the scene, and
placing a 3D mesh(Stanford Bunny mesh) over the location of the
selected object. We also display the full screen capture in field of
view, which shows the resulting inpainted sequence of frames.
(a) (b)
Figure 1: (a) Shows what the user would see while selecting an area.
(b) Shows a selected object.
3.1 Object Selection
For object selection, we let the user draw a 3D contour around the
object to select it through the HoloLens. For the drawing method,
we designed the Surface-Drawing method, which lets the user draw
on the detected real world surface in following way: For the draw-
ing gesture, we use pinch-and-drag gestures to start the drawing on
the detected surface and release-pinch gestures to finish the draw-
ing. To reduce noise in the gesture input, we sample the user’s
drawing positions at 30 Hz and the finished annotation’s path points
at 1 point per 1 mm. For drawing on the surface, we define the draw-
ing position as the intersection between the detected surface mesh
data and a ray cast from user’s head through the finger tip position.
Consequently, as the user is drawing annotations, the user can eas-
ily verge on the object of interest since the annotation is displayed
at the detected surface. The drawing method is determined to be
effective through pilot studies.
The completed contour drawing would be transformed into 2D
pixel-space coordinate system so that it could be used for the in-
painting algorithm.
3.2 Tracking
Once the object was selected, we needed to track the objects be-
tween frames. We used OpenCV [4] implementations of Shi et al.’s
paper ”Good Features to Track” [21] and Bouguet’s ”Pyramidal im-
plementation of the affine Lucas Kanade feature tracker description
of the algorithm” [2].
On the original frame we ran cv::goodFeaturesToTrack to
find 1000 features and then culled the features to be only
ones in the selected area. In the proceeding frames we ran
cv::cvCalcOpticalFlowPyrLK on the features to see which ones
were still present. We then created a velocity vector from the cen-
troid the of features still present both in the original frame and the
update frame. We updated the original selection area with the ve-
locity vector to find the new selection area to inpaint.
Figure 2: Patchmatch Algorithm [CC BY-SA 3.0, Alexandre Delesse]
3.3 Inpainting
Our algorithm uses the PatchMatch algorithm for inpainting. The
PatchMatch algorithm aims at finding nearest-neighbor matches be-
tween image patches. Best matches are found via random sampling
and the coherence of imagery allows for smooth propagation across
patches.
In the first step of the PatchMatch algorithm, the nearest-
neighbor field is initialized by filling it with random offset values or
information available earlier. Iterative process is then applied to the
nearest-neighbour field. During this process, offsets are examined
in scan order and good patches are then propagated towards adja-
cent pixels. While propagating, if the coherent mapping is good ear-
lier, then all of the mapping is filled into the adjacent pixels of same
coherent region. A random search is carried out in the neighbour-
hood to look for the best offset found. The halting criteria for the
iterations depends upon the convergence of the nearest-neighbour
fields, which was found to be around 4-5 iterations as per the article
by Barnes et al. [1].
Figure 2 illustrates the basis of the PatchMatch. The grey region
in the ellipse is the area of the image that needs to be inpainted.
The entire image is scanned for the best match for the patches sur-
rounding the selected region and these patches are propagated ac-
cordingly.
3.3.1 Algorithm for Video Sequence
In our implementation for HoloSwap, this algorithm for inpainting
is used to ”remove” the selected objects in pixel-space. We first se-
lect the region of interest or the object in the visible scene through
interaction with the HoloLens as in Figure 3a. The selected region
of interest is then used to create a mask for inpainting. The ini-
tial mask is white in the region of interest and black everywhere
(a) Cow Selected (b) Full Inpainting Mask (c) Thin Inpainting Mask (d) Thick Inpainting Mask
Figure 3: Example masks for inpainting a selection.
else as shown in Figure 3b. The first frame of the video sequence
is inpainted in pixel-space using the initial mask and patterns from
image patches of the entire frame is used to fill in the region of inter-
est. For the consecutive frames, the mask is updated to account for
slight movements in the position of web camera on the HoloLens.
The initial mask is altered by using the OpenCV [4] implementation
of erosion, and the initial mask is subtracted from the altered previ-
ous mask to give a ring of selection area on mask. This ring-shaped
mask as in Figure 3c is the new update mask for current frame. The
update mask is also made to shift along with object’s location by
tracking the selected object’s features so that the mask remains in
the right location with respect to the selected object. This process
is repeated for every frame by updating masks from the previous
frame, thereby achieving object removal by inpainting at the loca-
tion of object in pixel-space.
3.3.2 Inpainting and Mask Updating Methods
Three methods of inpainting were incorporated in attempt to im-
prove efficiency. The first method involved inpainting by selecting
matching texture patches from the entire frame. The PatchMatch
algorithm scans the entire frame for suitable matches. In the sec-
ond method, we use only the area around the selected object for in-
painting in order to improve computation time. To further improve
performance, we copied the inpainted portion of the first frame into
the area enclosed by initial mask to maintain consistency between
frames and updated the inpainting using area around selected ob-
ject. In the third method, we used the ring-shaped mask as men-
tioned in Section 3.2.1 to further reduce computation time. The size
of the ring was varied by altering number of iterations in OpenCV’s
erosion function to check variation in accuracy of inpainting.
3.4 Display
We tested two forms of displaying the results: video inpainting in
field of view rendering and object replacement in still frame. For
both approaches, the display image is placed on a plane (in future
referred to as ”display plane”) in the scene. To display the image
correctly, we get the world-to-camera matrix and projection matrix
from the HoloLens webcam when the image is taken. Then, the
matrices are used in the shader to calculate UV texture space for
the plane in following ways: First, the plane’s corner vertex posi-
tions are calculated in world space. Second, the world space vertex
positions are transformed so that they are relative to the physical
web camera on the HoloLens. Third, the camera relative vertices
are converted into normalized clipping space. Lastly, x and y com-
ponents of the vertices are used to define UV space of the texture,
which is used to correctly apply the image texture on to the plane.
3.4.1 Video Field of View
For the video field of view, the display plane was placed slightly in
front of the user’s view. This was done by positioning the display
plane and rotating it based on camera to world matrix. When the in-
painted display plane is placed, we repeat the procedure, receiving
next frame, calling the inpainting update function, and displaying
it. We originally implemented this both as a solid display plane and
with the edges of the display plane vignetted to transparency.
3.4.2 Replacement in Still Frame
The object replacement in still frame is done in 4 steps. First, we
retrieve selected object’s 2D center position in pixel-space and in-
painted frame image from the inpainting algorithm. Second, we
unproject a ray from the 2D center position to 3D real world sur-
face mesh in attempt to find an intersecting position where we can
place the display plane. Third, if the intersection is found, we make
the display plane with inpaint image using the steps described pre-
viously and place it on the intersection. Lastly, we place the re-
placement object at the same intersection point.
In order to prevent, the display plane from occluding the replace-
ment object, we render both objects separately and render the re-
placement object last so that replacement object can be rendered on
top of the display plane.
3.5 System Design
We used Microsoft Visual Studio and Unity for creating this project.
The application was written in two parts, one was creating a
Dynamic-link library (DLL) for inpainting and the other was se-
lection and display in the HoloLens. The inpainting DLL required
OpenCV, so we had to build and add OpenCV’s pre-alpha DLLs
which were not complete. We also had a series of C# scripts which
managed data transfer with the inpainting DLL, taking screenshots,
placing display planes correctly, and capturing gestures.
4 RESULTS
We tested three different ways to display inpainting. We placed (1)
the display plane and a bunny at the inpainted depth, (2) a solid
display plane covering the field of view, and (3) a vignetted display
plane covering the field of view. Examples of these displays are
shown in Figure 4. In all three cases, having the frames update
every 4 seconds was not ideal, and lead to discontinuity when the
user moved too much between frames. This would lead to ghosting
until the next frame updated.
We found that object replacement, displaying a plane and a white
bunny at the inpainted depth, was the best was to obscure the in-
painted object. When this was done, the majority of the inpainted
area is covered by the bunny. Also we chose the color white for
the bunny, because it was the least transparent. We also found that
inpainting on a light or white background worked better for inpaint-
ing, though our example images are using a black background.
We tested both a solid and vignetted display plane covering the
users field of view. We found that both display settings had their
drawbacks. The solid display plane was better at inpainting the
object, because no portion of the plane was transparent. The issue
(a) Original Scene (b) Vignetted Inpainted Field of View Image overlayed
(c) Simulated image of inpainted scene with Stanford Bunny[24] replacing
the phaser
(d) Inpainted Field of View Image overlayed
Figure 4: The original scene and example views of the three ways we tested object removal and replacement.
Table 1: Inpainting Runtime: Different masks drawing patches from
different sized areas were tested on a frame. The selected area to
inpaint was an ellipse with width and height 1/16th of frame (1280 x
720)
Inpainting Mask Area Inpainted From Seconds
Full mask Full frame 10+
Full mask 1/4 frame 4.85
Thin mask 1/4 frame 4.05
Thick mask 1/4 frame 4.2
was that a user could see the display planes edges easily, which was
potentially distracting. The vignetted display plane’s edges melted
into the background, but the inpainting was semitransparent and
therefore did not cover the object as well.
5 ASSESSMENT
Assessment of HoloSwap was done based on the accuracy and ef-
ficiency of inpainting and object replacement. Our main goal was
to assess viability of real time object removal and replacement on
Microsoft Hololens and hence runtime was an important factor in
assessing our results.
5.1 Object Removal in Video Sequence
For Object Removal test cases, we used a contour of an ellipse with
size of about 1/16th of the frame. The selected object is centered at
the scene in these cases for consistency of evaluation. Table 1 shows
the effective runtime for each of the methods used. We first used
the initial mask as in Figure 3b for inpainting and the patches for
inpainting were extracted from the entire frame of size 1280x720.
For this method, we found that the time required to obtain the in-
painted result took about 10 seconds on an average. In order to
reduce the runtime, we reduced the area of frame for which patches
were extracted for inpaiting to about a quarter of the frame and this
gave us a runtime of 4.85 seconds on average.
To further improve the runtime, we retained the inpainting in-
formation from the first frame and reduced the area of mask such
that only the edges of the region of selected object was inpainted to
account for slight movements. We used 10 iterations in OpenCV’s
erosion function for the mask and this gave us a runtime of 4.05
seconds but this method significantly deteriorated the quality of in-
painting. To improve the quality of inpainting we used a thicker
mask by adjusting the number of iterations to 20 in the erosion
function and this had a runtime of 4.2 seconds. Though this method
was slightly slower than the one with thin mask, we retained this as
our final method due to the higher quality of inpainting. There was
a trade off between the quality of inpainting and runtime and we
chose the method which gave us the best results amongst all the
explored methods.
5.2 Object Replacement
We used the Stanford Bunny mesh to replace the selected object.
Object replacement was performed by inpainting the selected ob-
ject in pixel-space and then placing the bunny at the location of
previously selected object. In most of the test results, the bunny
mesh seemed to placed at the right location and occluded the previ-
ously selected object completely. Runtime for object replacement
was similar to that of inpainting the selected object using full mask
and information from full frame. The results were also significantly
better for selected objects which had sharp contrast with its back-
ground. Object replacement seemed to produce better results for
the augmented reality device as compared to just the removal of the
selected object.
6 DISCUSSION
There are some limitation in HoloLens that affects the user experi-
ence when using HoloSwap. First, the user would be able to still
see the real object after the inpainting since HoloLens uses a see-
through display. However, when HoloSwap is used in an environ-
ment where the selected object is surrounded by white background
objects and the HoloLens display brightness is at 100%, the real
object becomes hard to see due to the display brightness.
Second, every user would have different experience since they
have different eye dominance. For many users, the display plane
would look slightly shifted since the HoloLens display does not
account for eye dominance.
Third, there is a limitation to the amount we can inpaint at once
and display at once. Because HoloLens’ webcam has smaller field
of view than that of human eyes, we are unable to inpaint full field
of view that human can see. In addition, since HoloLens only has
small field of view display, it cannot display all of the content gen-
erated by HoloSwap. Consequently, if the selected object is big or
close to the user, user would not be able to completely swap the
selected object with replacement object.
7 CONCLUSION
We tested HoloLens viability for inpainting and object removal and
replacement in real time. We achieve near real time inpainting on
the HoloLens and believe with a more efficient algorithm, we could
have real time results. We found using a thick ring-shaped mask
was the best way to implement video inpainting with consistency
between frames. Among our three test cases: object replacement,
object inpainting with a solid plane and object inpainting with a
vignetted plane, we found that object replacement had the most be-
lievable results. There are certain disadvantages of the HoloLens,
like field of view, eye separation, and the see-through display.
However, object replacement on see-through, head-mounted dis-
plays could be reasonable under certain circumstances like having
white/light background and replacement objects.
ACKNOWLEDGEMENTS
The authors wish to thank Matthew Turk and Tobias H¨ollerer for
their support. We would also like to thank Adam and Brandon for
letting us use the HoloLens in time of need.
REFERENCES
[1] C. Barnes, E. Shechtman, A. Finkelstein, and D. Goldman. Patch-
match: a randomized correspondence algorithm for structural image
editing. ACM Transactions on Graphics-TOG, 28(3):24, 2009.
[2] J.-Y. Bouguet. Pyramidal implementation of the affine lucas kanade
feature tracker description of the algorithm. Intel Corporation, 5(1-
10):4, 2001.
[3] D. A. Bowman, E. Kruijff, J. J. LaViola, and I. Poupyrev. 3D User In-
terfaces: Theory and Practice. Addison Wesley Longman Publishing
Co., Inc., Redwood City, CA, USA, 2004.
[4] G. Bradski. Opencv library. Dr. Dobb’s Journal of Software Tools,
2000.
[5] A. Criminisi, P. Perez, and K. Toyama. Object removal by exemplar-
based inpainting. In Computer Vision and Pattern Recognition, 2003.
Proceedings. 2003 IEEE Computer Society Conference on, volume 2,
pages II–721. IEEE, 2003.
[6] L.-J. Deng, T.-Z. Huang, and X.-L. Zhao. Exemplar-based im-
age inpainting using a modified priority definition. PloS one,
10(10):e0141199, 2015.
[7] I. Drori, D. Cohen-Or, and H. Yeshurun. Fragment-based image com-
pletion. In ACM Transactions on graphics (TOG), volume 22:3, pages
303–312. ACM, 2003.
[8] A. Enomoto and H. Saito. Diminished reality using multiple handheld
cameras. In Proc. ACCV, volume 7, pages 130–135. Citeseer, 2007.
[9] P. Harrison. A non-hierarchical procedure for re-synthesis of complex
textures. University of West Bohemia, 2001.
[10] J. Herling and W. Broll. Advanced self-contained object removal for
realizing real-time diminished reality in unconstrained environments.
In Mixed and Augmented Reality (ISMAR), 2010 9th IEEE Interna-
tional Symposium on, pages 207–212. IEEE, 2010.
[11] J. Herling and W. Broll. High-quality real-time video inpainting with
pixmix. IEEE transactions on visualization and computer graphics,
20(6):866–879, 2014.
[12] N. Kawai, M. Yamasaki, T. Sato, and N. Yokoya. Ar marker hid-
ing based on image inpainting and reflection of illumination changes.
In Mixed and Augmented Reality (ISMAR), 2012 IEEE International
Symposium on, pages 293–294. IEEE, 2012.
[13] A. C. Kokaram, R. D. Morris, W. J. Fitzgerald, and P. J. Rayner. De-
tection of missing data in image sequences. IEEE Transactions on
Image Processing, 4(11):1496–1508, 1995.
[14] O. Korkalo, M. Aittala, and S. Siltanen. Light-weight marker hiding
for augmented reality. In Mixed and Augmented Reality (ISMAR),
2010 9th IEEE International Symposium on, pages 247–248. IEEE,
2010.
[15] O. Miksik, V. Vineet, M. Lidegaard, R. Prasaath, M. Nießner,
S. Golodetz, S. L. Hicks, P. Perez, S. Izadi, and P. H. S. Torr. The
semantic paintbrush: Interactive 3d mapping and recognition in large
outdoor spaces. Proceedings of the 33nd annual ACM conference on
Human factors in computing systems (CHI), 2015.
[16] A. Newson, A. Almansa, M. Fradet, Y. Gousseau, and P. P´erez. Video
inpainting of complex scenes. SIAM Journal on Imaging Sciences,
7(4):1993–2019, 2014.
[17] B. Nuernberger, K.-C. Lien, T. H¨ollerer, and M. Turk. Interpreting 2d
gesture annotations in 3d augmented reality. In 2016 IEEE Symposium
on 3D User Interfaces (3DUI), pages 149–158, March 2016.
[18] O. Oda and S. Feiner. 3d referencing techniques for physical objects in
shared augmented reality. In 2012 IEEE International Symposium on
Mixed and Augmented Reality (ISMAR), pages 207–215, Nov 2012.
[19] A. Olwal, H. Benko, and S. Feiner. Senseshapes: Using statistical ge-
ometry for object selection in a multimodal augmented reality system.
In Proceedings of the 2Nd IEEE/ACM International Symposium on
Mixed and Augmented Reality, ISMAR ’03, pages 300–, Washington,
DC, USA, 2003. IEEE Computer Society.
[20] S. D. Rane, G. Sapiro, and M. Bertalmio. Structure and texture filling-
in of missing image blocks in wireless transmission and compression
applications. IEEE Transactions on image processing, 12(3):296–303,
2003.
[21] J. Shi and C. Tomasi. Good features to track. In Computer Vision
and Pattern Recognition, 1994. Proceedings CVPR’94., 1994 IEEE
Computer Society Conference on, pages 593–600. IEEE, 1994.
[22] S. Siltanen. Diminished reality for augmented reality interior design.
The Visual Computer, pages 1–16, 2015.
[23] S. Siltanen, H. Sarasp, and J. Karvonen. [demo] a complete interior
design solution with diminished reality. In 2014 IEEE International
Symposium on Mixed and Augmented Reality (ISMAR), pages 371–
372, Sept 2014.
[24] G. Turk and M. Levoy. The stanford bunny, 1993.

More Related Content

What's hot

A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
 A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv... A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...Chennai Networks
 
Montage4D: Interactive Seamless Fusion of Multiview Video Textures
Montage4D: Interactive Seamless Fusion of Multiview Video TexturesMontage4D: Interactive Seamless Fusion of Multiview Video Textures
Montage4D: Interactive Seamless Fusion of Multiview Video TexturesRuofei Du
 
Simulation of collision avoidance by navigation
Simulation of collision avoidance by navigationSimulation of collision avoidance by navigation
Simulation of collision avoidance by navigationeSAT Publishing House
 
Overview Of Video Object Tracking System
Overview Of Video Object Tracking SystemOverview Of Video Object Tracking System
Overview Of Video Object Tracking SystemEditor IJMTER
 
Soft Shadow Rendering based on Real Light Source Estimation in Augmented Reality
Soft Shadow Rendering based on Real Light Source Estimation in Augmented RealitySoft Shadow Rendering based on Real Light Source Estimation in Augmented Reality
Soft Shadow Rendering based on Real Light Source Estimation in Augmented RealityWaqas Tariq
 
Integrating UAV Development Technology with Augmented Reality Toward Landscap...
Integrating UAV Development Technology with Augmented Reality Toward Landscap...Integrating UAV Development Technology with Augmented Reality Toward Landscap...
Integrating UAV Development Technology with Augmented Reality Toward Landscap...Tomohiro Fukuda
 
Detection and Tracking of Moving Object: A Survey
Detection and Tracking of Moving Object: A SurveyDetection and Tracking of Moving Object: A Survey
Detection and Tracking of Moving Object: A SurveyIJERA Editor
 
Mixed Reality: Pose Aware Object Replacement for Alternate Realities
Mixed Reality: Pose Aware Object Replacement for Alternate RealitiesMixed Reality: Pose Aware Object Replacement for Alternate Realities
Mixed Reality: Pose Aware Object Replacement for Alternate RealitiesAlejandro Franceschi
 
Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...
Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...
Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...Kan Ouivirach, Ph.D.
 
Object detection technique using bounding box algorithm for
Object detection technique using bounding box algorithm forObject detection technique using bounding box algorithm for
Object detection technique using bounding box algorithm forVESIT,Chembur,Mumbai
 
Object tracking
Object trackingObject tracking
Object trackingchirase44
 
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...Tomohiro Fukuda
 
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...ijcsa
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...CSCJournals
 
Copy of 3 d report
Copy of 3 d reportCopy of 3 d report
Copy of 3 d reportVirajjha
 
Multiple Object Tracking
Multiple Object TrackingMultiple Object Tracking
Multiple Object TrackingRainakSharma
 

What's hot (19)

Object tracking
Object trackingObject tracking
Object tracking
 
A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
 A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv... A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
 
Montage4D: Interactive Seamless Fusion of Multiview Video Textures
Montage4D: Interactive Seamless Fusion of Multiview Video TexturesMontage4D: Interactive Seamless Fusion of Multiview Video Textures
Montage4D: Interactive Seamless Fusion of Multiview Video Textures
 
Simulation of collision avoidance by navigation
Simulation of collision avoidance by navigationSimulation of collision avoidance by navigation
Simulation of collision avoidance by navigation
 
Overview Of Video Object Tracking System
Overview Of Video Object Tracking SystemOverview Of Video Object Tracking System
Overview Of Video Object Tracking System
 
Soft Shadow Rendering based on Real Light Source Estimation in Augmented Reality
Soft Shadow Rendering based on Real Light Source Estimation in Augmented RealitySoft Shadow Rendering based on Real Light Source Estimation in Augmented Reality
Soft Shadow Rendering based on Real Light Source Estimation in Augmented Reality
 
Integrating UAV Development Technology with Augmented Reality Toward Landscap...
Integrating UAV Development Technology with Augmented Reality Toward Landscap...Integrating UAV Development Technology with Augmented Reality Toward Landscap...
Integrating UAV Development Technology with Augmented Reality Toward Landscap...
 
Survey 1 (project overview)
Survey 1 (project overview)Survey 1 (project overview)
Survey 1 (project overview)
 
Detection and Tracking of Moving Object: A Survey
Detection and Tracking of Moving Object: A SurveyDetection and Tracking of Moving Object: A Survey
Detection and Tracking of Moving Object: A Survey
 
Mixed Reality: Pose Aware Object Replacement for Alternate Realities
Mixed Reality: Pose Aware Object Replacement for Alternate RealitiesMixed Reality: Pose Aware Object Replacement for Alternate Realities
Mixed Reality: Pose Aware Object Replacement for Alternate Realities
 
Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...
Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...
Extracting the Object from the Shadows: Maximum Likelihood Object/Shadow Disc...
 
Object detection technique using bounding box algorithm for
Object detection technique using bounding box algorithm forObject detection technique using bounding box algorithm for
Object detection technique using bounding box algorithm for
 
F1063337
F1063337F1063337
F1063337
 
Object tracking
Object trackingObject tracking
Object tracking
 
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
 
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
 
Copy of 3 d report
Copy of 3 d reportCopy of 3 d report
Copy of 3 d report
 
Multiple Object Tracking
Multiple Object TrackingMultiple Object Tracking
Multiple Object Tracking
 

Similar to 291A_report_Hannah-Deepa-YunSuk

visual realism in geometric modeling
visual realism in geometric modelingvisual realism in geometric modeling
visual realism in geometric modelingsabiha khathun
 
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...ijsrd.com
 
A New Algorithm for Tracking Objects in Videos of Cluttered Scenes
A New Algorithm for Tracking Objects in Videos of Cluttered ScenesA New Algorithm for Tracking Objects in Videos of Cluttered Scenes
A New Algorithm for Tracking Objects in Videos of Cluttered ScenesZac Darcy
 
Gesture detection by virtual surface
Gesture detection by virtual surfaceGesture detection by virtual surface
Gesture detection by virtual surfaceAshish Garg
 
VR_Module_3_PPT.pptx
VR_Module_3_PPT.pptxVR_Module_3_PPT.pptx
VR_Module_3_PPT.pptxvrfv
 
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operatorProposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operatorQUESTJOURNAL
 
Implementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time VideoImplementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time VideoIDES Editor
 
Correcting garment set deformalities on virtual human model using transparanc...
Correcting garment set deformalities on virtual human model using transparanc...Correcting garment set deformalities on virtual human model using transparanc...
Correcting garment set deformalities on virtual human model using transparanc...eSAT Publishing House
 
10.1109@ecs.2015.7124874
10.1109@ecs.2015.712487410.1109@ecs.2015.7124874
10.1109@ecs.2015.7124874Ganesh Raja
 
Development of Human Tracking System For Video Surveillance
Development of Human Tracking System For Video SurveillanceDevelopment of Human Tracking System For Video Surveillance
Development of Human Tracking System For Video Surveillancecscpconf
 
Automatic Building detection for satellite Images using IGV and DSM
Automatic Building detection for satellite Images using IGV and DSMAutomatic Building detection for satellite Images using IGV and DSM
Automatic Building detection for satellite Images using IGV and DSMAmit Raikar
 
K-Means Clustering in Moving Objects Extraction with Selective Background
K-Means Clustering in Moving Objects Extraction with Selective BackgroundK-Means Clustering in Moving Objects Extraction with Selective Background
K-Means Clustering in Moving Objects Extraction with Selective BackgroundIJCSIS Research Publications
 
Conception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfConception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfSofianeHassine2
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Real time implementation of object tracking through
Real time implementation of object tracking throughReal time implementation of object tracking through
Real time implementation of object tracking througheSAT Publishing House
 

Similar to 291A_report_Hannah-Deepa-YunSuk (20)

visual realism in geometric modeling
visual realism in geometric modelingvisual realism in geometric modeling
visual realism in geometric modeling
 
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
 
A New Algorithm for Tracking Objects in Videos of Cluttered Scenes
A New Algorithm for Tracking Objects in Videos of Cluttered ScenesA New Algorithm for Tracking Objects in Videos of Cluttered Scenes
A New Algorithm for Tracking Objects in Videos of Cluttered Scenes
 
DragGan AI.pdf
DragGan AI.pdfDragGan AI.pdf
DragGan AI.pdf
 
Gesture detection by virtual surface
Gesture detection by virtual surfaceGesture detection by virtual surface
Gesture detection by virtual surface
 
VR_Module_3_PPT.pptx
VR_Module_3_PPT.pptxVR_Module_3_PPT.pptx
VR_Module_3_PPT.pptx
 
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operatorProposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
 
Implementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time VideoImplementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time Video
 
Correcting garment set deformalities on virtual human model using transparanc...
Correcting garment set deformalities on virtual human model using transparanc...Correcting garment set deformalities on virtual human model using transparanc...
Correcting garment set deformalities on virtual human model using transparanc...
 
10.1109@ecs.2015.7124874
10.1109@ecs.2015.712487410.1109@ecs.2015.7124874
10.1109@ecs.2015.7124874
 
Development of Human Tracking System For Video Surveillance
Development of Human Tracking System For Video SurveillanceDevelopment of Human Tracking System For Video Surveillance
Development of Human Tracking System For Video Surveillance
 
Automatic Building detection for satellite Images using IGV and DSM
Automatic Building detection for satellite Images using IGV and DSMAutomatic Building detection for satellite Images using IGV and DSM
Automatic Building detection for satellite Images using IGV and DSM
 
2001714
20017142001714
2001714
 
K-Means Clustering in Moving Objects Extraction with Selective Background
K-Means Clustering in Moving Objects Extraction with Selective BackgroundK-Means Clustering in Moving Objects Extraction with Selective Background
K-Means Clustering in Moving Objects Extraction with Selective Background
 
Conception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfConception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdf
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
ei2106-submit-opt-415
ei2106-submit-opt-415ei2106-submit-opt-415
ei2106-submit-opt-415
 
D018112429
D018112429D018112429
D018112429
 
Real time implementation of object tracking through
Real time implementation of object tracking throughReal time implementation of object tracking through
Real time implementation of object tracking through
 
Animation
AnimationAnimation
Animation
 

291A_report_Hannah-Deepa-YunSuk

  • 1. HoloSwap: Object Removal and Replacement using Microsoft Hololens Yun Suk Chang∗ Deepashree Gurumurthy† Hannah Wolfe‡ ABSTRACT Object removal and replacement in augmented reality has applica- tions in interior design, marker hiding and many other fields. This project explores the viability of object removal and replacement for the Microsoft HoloLens. We implemented gesture based ob- ject selection, video inpainting, tracking and 3 different ways to display our results. We placed (1) the display plane and a white Stanford bunny[24] at the inpainted depth, (2) a solid display plane with video inpainting covering the field of view, and (3) a vignetted view of the video inpainting. We found placing the display plane and a white Stanford bunny at the inpainted depth to have the most reasonable results. Due to the semi-transparency of the HoloLens replacing the object with a white mesh and having the object on a light background is important. We also tested 3 different inpainting masks for the updating frame. Our results show that the HoloLens can be a reasonable solution in certain cases for object replacement in augmented reality. 1 INTRODUCTION With increase in research development and popularity, the number of augmented reality (AR) application increased in many different areas. One of the popular AR application is object removal and re- placement. The object removal and replacement application ranges from interior design [23] to replacing a tracking marker in maker based AR [12], using diminished reality techniques. There has been much work in tablet or other mobile device based diminished real- ity application [8, 12, 14]. However, little is known for diminished reality on see-through, head-mounted displays. There are many pros to export diminish reality applications onto see-through, head-mounted display devices. First, the user is able to see the replaced or removed object with correct projection from their view rather than from camera’s projection. Second, the user gets better experience since the actual object is hidden from their view at all times while they are wearing the display, preventing from breaking out of the immersion. Lastly, in see-through dis- plays, only the diminished portion of the real world view is affected by display and camera distortions unlike the view from the camera where whole view is affected. In this work, we perform diminished reality on Microsoft HoloLens using inpainting technique to test the viability of object replacement in see-through, head-mounted display. We test the via- bility in three different ways: (1) placing an inpainted display frame and replacement object on the object selected for replacement to test the object replacement, (2) running video-based inpainting al- gorithm in view to test object removal in changing viewpoints, and (3) blending an inpaint display frame into the scene to test the limits of minimal seam achieved. Through our evaluation, we discovered the limitations of di- minished reality on see-through, head-mounted displays, includ- ing problem of eye separation and dominance, limitation of field of view, and inability to completely block out real world scene. ∗e-mail: ychang@cs.ucsb.edu †e-mail: deepashree@umail.ucsb.edu ‡e-mail: wolfe@cs.ucsb.edu However, our results show that in certain environments, see-though display device can achieve reasonable object replacement and re- moval. 2 RELATED WORK 2.1 Object Selection in Augmented Reality There has been much work done in selecting object in 3D space using augmented reality. For example, Olwal et al. [19] used track- ing glove to point and cast a primitive object to the scene to select an area of interest, which is used for object selection by statistical analysis. On the other hand, Oda et al. [18] uses tracked Wii remote to select the object by moving an adjustable sphere into the object of interest. More recently, Miksik et al. [15] presented the Seman- tic Paintbrush for selecting the real-world objects via a ray-casting technique and semantic segmentation. Our drawing method can be considered a pointing or ray-casting technique, almost like using a precise can of spray-paint [3, 15, 17]. 2.2 Inpainting in Augmented Reality Enomoto et al.[8] implemented inpainting of objects using multi- ple cameras. Later predetermined object removal for augmented reality was implemented by Herling et al. [10]. Researchers have implemented inpainting to cover markers in augmented reality applications[12, 14]. In their papers, they do not remove objects, but just cover predetermined 2D markers with textures. We could not find any examples of video inpainting in an augmented reality head-mounted display. 2.3 Origins of Inpainting Inpainting was first done digitally in 1995 to fix scratches in film [13]. Harrison et al.proposed a non-hierarchical procedure for re- synthesis of complex textures [9]. An output image was generated by adding pixels selected from input image. This method produced good results but took a long time for implementation. Other researchers tried to accelerate Harrison’s algorithm, and while their results were potentially faster they caused more artifacts [7, 20]. Many of these papers tried to combine, texture synthesis, replicating textures ad infinitum for large regions using exemplar- based techniques, and inpainting, filling small gaps in images by propagating linear structures via diffusion. ”Fragment-based im- age completion” would leave blurry patches [7]. ”Structure and texture filling-in of missing image blocks in wireless transmission and compression applications,” would inpaint some sections and use texture synthesis in others causing inconsistencies [20]. Cri- minisi et al.was the first group to combine texture synthesis and inpainting together effectively and efficiently. They did this by in- painting based on a priority function instead of raster-scan order [5]. The PatchMatch algorithm proposed by Barnes et al.is the ba- sis for many modern inpainting algorithms. Their algorithm finds similar patches and copies data from surrounding patches to inpaint areas [1]. 2.4 State of the Art Inpainting Inpainting has been performed on video sequences in the recent years. ”Video Inpainting of Complex Scenes” [16] produces very
  • 2. good results of inpainting. The inpainting algorithm stores infor- mation from previous frames and thereby can also recreate miss- ing text. ”Real time video inpainting using PixMix” [11] also ac- counted for illumination changes between frames while inpainting. ”Exemplar-Based Image Inpainting Using a Modified Priority Definition” [6] produced high quality output images by propagat- ing patches based on their priority definition. Lately, novel inpaint- ing algorithms have been included in pipeline for modular real time diminished reality for indoor applications [22]. This however only proposed a method which could be used in augmented reality de- vices and did not use any such device for their implementation. 3 APPROACH There are many parts of the pipeline to build an interactive applica- tion to select, remove and replace objects in augmented reality. The desired object to be removed or replaced is first selected by inter- action with HoloLens. For object removal, we use a method of in- painting using PatchMatch [1]. The first frame is captured through HoloLens and the output of the first frame is an inpainted area at the location of the selected object. We used different approaches to inpaint the current frame. The first method was by using informa- tion from entire frame, the second method only used information surrounding the selected object and the last method used informa- tion from previous inpainted frame and small area around selected object in current frame. In order to achieve effective inpainting for consecutive frames at the location of the object in world space, we incorporated object tracking into our algorithm. This would help inpainting the current location of object in every frame, thereby accounting for slight changes in position of the camera. Object replacement is implemented by placing a plane in the scene, and placing a 3D mesh(Stanford Bunny mesh) over the location of the selected object. We also display the full screen capture in field of view, which shows the resulting inpainted sequence of frames. (a) (b) Figure 1: (a) Shows what the user would see while selecting an area. (b) Shows a selected object. 3.1 Object Selection For object selection, we let the user draw a 3D contour around the object to select it through the HoloLens. For the drawing method, we designed the Surface-Drawing method, which lets the user draw on the detected real world surface in following way: For the draw- ing gesture, we use pinch-and-drag gestures to start the drawing on the detected surface and release-pinch gestures to finish the draw- ing. To reduce noise in the gesture input, we sample the user’s drawing positions at 30 Hz and the finished annotation’s path points at 1 point per 1 mm. For drawing on the surface, we define the draw- ing position as the intersection between the detected surface mesh data and a ray cast from user’s head through the finger tip position. Consequently, as the user is drawing annotations, the user can eas- ily verge on the object of interest since the annotation is displayed at the detected surface. The drawing method is determined to be effective through pilot studies. The completed contour drawing would be transformed into 2D pixel-space coordinate system so that it could be used for the in- painting algorithm. 3.2 Tracking Once the object was selected, we needed to track the objects be- tween frames. We used OpenCV [4] implementations of Shi et al.’s paper ”Good Features to Track” [21] and Bouguet’s ”Pyramidal im- plementation of the affine Lucas Kanade feature tracker description of the algorithm” [2]. On the original frame we ran cv::goodFeaturesToTrack to find 1000 features and then culled the features to be only ones in the selected area. In the proceeding frames we ran cv::cvCalcOpticalFlowPyrLK on the features to see which ones were still present. We then created a velocity vector from the cen- troid the of features still present both in the original frame and the update frame. We updated the original selection area with the ve- locity vector to find the new selection area to inpaint. Figure 2: Patchmatch Algorithm [CC BY-SA 3.0, Alexandre Delesse] 3.3 Inpainting Our algorithm uses the PatchMatch algorithm for inpainting. The PatchMatch algorithm aims at finding nearest-neighbor matches be- tween image patches. Best matches are found via random sampling and the coherence of imagery allows for smooth propagation across patches. In the first step of the PatchMatch algorithm, the nearest- neighbor field is initialized by filling it with random offset values or information available earlier. Iterative process is then applied to the nearest-neighbour field. During this process, offsets are examined in scan order and good patches are then propagated towards adja- cent pixels. While propagating, if the coherent mapping is good ear- lier, then all of the mapping is filled into the adjacent pixels of same coherent region. A random search is carried out in the neighbour- hood to look for the best offset found. The halting criteria for the iterations depends upon the convergence of the nearest-neighbour fields, which was found to be around 4-5 iterations as per the article by Barnes et al. [1]. Figure 2 illustrates the basis of the PatchMatch. The grey region in the ellipse is the area of the image that needs to be inpainted. The entire image is scanned for the best match for the patches sur- rounding the selected region and these patches are propagated ac- cordingly. 3.3.1 Algorithm for Video Sequence In our implementation for HoloSwap, this algorithm for inpainting is used to ”remove” the selected objects in pixel-space. We first se- lect the region of interest or the object in the visible scene through interaction with the HoloLens as in Figure 3a. The selected region of interest is then used to create a mask for inpainting. The ini- tial mask is white in the region of interest and black everywhere
  • 3. (a) Cow Selected (b) Full Inpainting Mask (c) Thin Inpainting Mask (d) Thick Inpainting Mask Figure 3: Example masks for inpainting a selection. else as shown in Figure 3b. The first frame of the video sequence is inpainted in pixel-space using the initial mask and patterns from image patches of the entire frame is used to fill in the region of inter- est. For the consecutive frames, the mask is updated to account for slight movements in the position of web camera on the HoloLens. The initial mask is altered by using the OpenCV [4] implementation of erosion, and the initial mask is subtracted from the altered previ- ous mask to give a ring of selection area on mask. This ring-shaped mask as in Figure 3c is the new update mask for current frame. The update mask is also made to shift along with object’s location by tracking the selected object’s features so that the mask remains in the right location with respect to the selected object. This process is repeated for every frame by updating masks from the previous frame, thereby achieving object removal by inpainting at the loca- tion of object in pixel-space. 3.3.2 Inpainting and Mask Updating Methods Three methods of inpainting were incorporated in attempt to im- prove efficiency. The first method involved inpainting by selecting matching texture patches from the entire frame. The PatchMatch algorithm scans the entire frame for suitable matches. In the sec- ond method, we use only the area around the selected object for in- painting in order to improve computation time. To further improve performance, we copied the inpainted portion of the first frame into the area enclosed by initial mask to maintain consistency between frames and updated the inpainting using area around selected ob- ject. In the third method, we used the ring-shaped mask as men- tioned in Section 3.2.1 to further reduce computation time. The size of the ring was varied by altering number of iterations in OpenCV’s erosion function to check variation in accuracy of inpainting. 3.4 Display We tested two forms of displaying the results: video inpainting in field of view rendering and object replacement in still frame. For both approaches, the display image is placed on a plane (in future referred to as ”display plane”) in the scene. To display the image correctly, we get the world-to-camera matrix and projection matrix from the HoloLens webcam when the image is taken. Then, the matrices are used in the shader to calculate UV texture space for the plane in following ways: First, the plane’s corner vertex posi- tions are calculated in world space. Second, the world space vertex positions are transformed so that they are relative to the physical web camera on the HoloLens. Third, the camera relative vertices are converted into normalized clipping space. Lastly, x and y com- ponents of the vertices are used to define UV space of the texture, which is used to correctly apply the image texture on to the plane. 3.4.1 Video Field of View For the video field of view, the display plane was placed slightly in front of the user’s view. This was done by positioning the display plane and rotating it based on camera to world matrix. When the in- painted display plane is placed, we repeat the procedure, receiving next frame, calling the inpainting update function, and displaying it. We originally implemented this both as a solid display plane and with the edges of the display plane vignetted to transparency. 3.4.2 Replacement in Still Frame The object replacement in still frame is done in 4 steps. First, we retrieve selected object’s 2D center position in pixel-space and in- painted frame image from the inpainting algorithm. Second, we unproject a ray from the 2D center position to 3D real world sur- face mesh in attempt to find an intersecting position where we can place the display plane. Third, if the intersection is found, we make the display plane with inpaint image using the steps described pre- viously and place it on the intersection. Lastly, we place the re- placement object at the same intersection point. In order to prevent, the display plane from occluding the replace- ment object, we render both objects separately and render the re- placement object last so that replacement object can be rendered on top of the display plane. 3.5 System Design We used Microsoft Visual Studio and Unity for creating this project. The application was written in two parts, one was creating a Dynamic-link library (DLL) for inpainting and the other was se- lection and display in the HoloLens. The inpainting DLL required OpenCV, so we had to build and add OpenCV’s pre-alpha DLLs which were not complete. We also had a series of C# scripts which managed data transfer with the inpainting DLL, taking screenshots, placing display planes correctly, and capturing gestures. 4 RESULTS We tested three different ways to display inpainting. We placed (1) the display plane and a bunny at the inpainted depth, (2) a solid display plane covering the field of view, and (3) a vignetted display plane covering the field of view. Examples of these displays are shown in Figure 4. In all three cases, having the frames update every 4 seconds was not ideal, and lead to discontinuity when the user moved too much between frames. This would lead to ghosting until the next frame updated. We found that object replacement, displaying a plane and a white bunny at the inpainted depth, was the best was to obscure the in- painted object. When this was done, the majority of the inpainted area is covered by the bunny. Also we chose the color white for the bunny, because it was the least transparent. We also found that inpainting on a light or white background worked better for inpaint- ing, though our example images are using a black background. We tested both a solid and vignetted display plane covering the users field of view. We found that both display settings had their drawbacks. The solid display plane was better at inpainting the object, because no portion of the plane was transparent. The issue
  • 4. (a) Original Scene (b) Vignetted Inpainted Field of View Image overlayed (c) Simulated image of inpainted scene with Stanford Bunny[24] replacing the phaser (d) Inpainted Field of View Image overlayed Figure 4: The original scene and example views of the three ways we tested object removal and replacement. Table 1: Inpainting Runtime: Different masks drawing patches from different sized areas were tested on a frame. The selected area to inpaint was an ellipse with width and height 1/16th of frame (1280 x 720) Inpainting Mask Area Inpainted From Seconds Full mask Full frame 10+ Full mask 1/4 frame 4.85 Thin mask 1/4 frame 4.05 Thick mask 1/4 frame 4.2 was that a user could see the display planes edges easily, which was potentially distracting. The vignetted display plane’s edges melted into the background, but the inpainting was semitransparent and therefore did not cover the object as well. 5 ASSESSMENT Assessment of HoloSwap was done based on the accuracy and ef- ficiency of inpainting and object replacement. Our main goal was to assess viability of real time object removal and replacement on Microsoft Hololens and hence runtime was an important factor in assessing our results. 5.1 Object Removal in Video Sequence For Object Removal test cases, we used a contour of an ellipse with size of about 1/16th of the frame. The selected object is centered at the scene in these cases for consistency of evaluation. Table 1 shows the effective runtime for each of the methods used. We first used the initial mask as in Figure 3b for inpainting and the patches for inpainting were extracted from the entire frame of size 1280x720. For this method, we found that the time required to obtain the in- painted result took about 10 seconds on an average. In order to reduce the runtime, we reduced the area of frame for which patches were extracted for inpaiting to about a quarter of the frame and this gave us a runtime of 4.85 seconds on average. To further improve the runtime, we retained the inpainting in- formation from the first frame and reduced the area of mask such that only the edges of the region of selected object was inpainted to account for slight movements. We used 10 iterations in OpenCV’s erosion function for the mask and this gave us a runtime of 4.05 seconds but this method significantly deteriorated the quality of in- painting. To improve the quality of inpainting we used a thicker mask by adjusting the number of iterations to 20 in the erosion function and this had a runtime of 4.2 seconds. Though this method was slightly slower than the one with thin mask, we retained this as our final method due to the higher quality of inpainting. There was a trade off between the quality of inpainting and runtime and we chose the method which gave us the best results amongst all the explored methods. 5.2 Object Replacement We used the Stanford Bunny mesh to replace the selected object. Object replacement was performed by inpainting the selected ob- ject in pixel-space and then placing the bunny at the location of previously selected object. In most of the test results, the bunny mesh seemed to placed at the right location and occluded the previ-
  • 5. ously selected object completely. Runtime for object replacement was similar to that of inpainting the selected object using full mask and information from full frame. The results were also significantly better for selected objects which had sharp contrast with its back- ground. Object replacement seemed to produce better results for the augmented reality device as compared to just the removal of the selected object. 6 DISCUSSION There are some limitation in HoloLens that affects the user experi- ence when using HoloSwap. First, the user would be able to still see the real object after the inpainting since HoloLens uses a see- through display. However, when HoloSwap is used in an environ- ment where the selected object is surrounded by white background objects and the HoloLens display brightness is at 100%, the real object becomes hard to see due to the display brightness. Second, every user would have different experience since they have different eye dominance. For many users, the display plane would look slightly shifted since the HoloLens display does not account for eye dominance. Third, there is a limitation to the amount we can inpaint at once and display at once. Because HoloLens’ webcam has smaller field of view than that of human eyes, we are unable to inpaint full field of view that human can see. In addition, since HoloLens only has small field of view display, it cannot display all of the content gen- erated by HoloSwap. Consequently, if the selected object is big or close to the user, user would not be able to completely swap the selected object with replacement object. 7 CONCLUSION We tested HoloLens viability for inpainting and object removal and replacement in real time. We achieve near real time inpainting on the HoloLens and believe with a more efficient algorithm, we could have real time results. We found using a thick ring-shaped mask was the best way to implement video inpainting with consistency between frames. Among our three test cases: object replacement, object inpainting with a solid plane and object inpainting with a vignetted plane, we found that object replacement had the most be- lievable results. There are certain disadvantages of the HoloLens, like field of view, eye separation, and the see-through display. However, object replacement on see-through, head-mounted dis- plays could be reasonable under certain circumstances like having white/light background and replacement objects. ACKNOWLEDGEMENTS The authors wish to thank Matthew Turk and Tobias H¨ollerer for their support. We would also like to thank Adam and Brandon for letting us use the HoloLens in time of need. REFERENCES [1] C. Barnes, E. Shechtman, A. Finkelstein, and D. Goldman. Patch- match: a randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics-TOG, 28(3):24, 2009. [2] J.-Y. Bouguet. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation, 5(1- 10):4, 2001. [3] D. A. Bowman, E. Kruijff, J. J. LaViola, and I. Poupyrev. 3D User In- terfaces: Theory and Practice. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 2004. [4] G. Bradski. Opencv library. Dr. Dobb’s Journal of Software Tools, 2000. [5] A. Criminisi, P. Perez, and K. Toyama. Object removal by exemplar- based inpainting. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, volume 2, pages II–721. IEEE, 2003. [6] L.-J. Deng, T.-Z. Huang, and X.-L. Zhao. Exemplar-based im- age inpainting using a modified priority definition. PloS one, 10(10):e0141199, 2015. [7] I. Drori, D. Cohen-Or, and H. Yeshurun. Fragment-based image com- pletion. In ACM Transactions on graphics (TOG), volume 22:3, pages 303–312. ACM, 2003. [8] A. Enomoto and H. Saito. Diminished reality using multiple handheld cameras. In Proc. ACCV, volume 7, pages 130–135. Citeseer, 2007. [9] P. Harrison. A non-hierarchical procedure for re-synthesis of complex textures. University of West Bohemia, 2001. [10] J. Herling and W. Broll. Advanced self-contained object removal for realizing real-time diminished reality in unconstrained environments. In Mixed and Augmented Reality (ISMAR), 2010 9th IEEE Interna- tional Symposium on, pages 207–212. IEEE, 2010. [11] J. Herling and W. Broll. High-quality real-time video inpainting with pixmix. IEEE transactions on visualization and computer graphics, 20(6):866–879, 2014. [12] N. Kawai, M. Yamasaki, T. Sato, and N. Yokoya. Ar marker hid- ing based on image inpainting and reflection of illumination changes. In Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on, pages 293–294. IEEE, 2012. [13] A. C. Kokaram, R. D. Morris, W. J. Fitzgerald, and P. J. Rayner. De- tection of missing data in image sequences. IEEE Transactions on Image Processing, 4(11):1496–1508, 1995. [14] O. Korkalo, M. Aittala, and S. Siltanen. Light-weight marker hiding for augmented reality. In Mixed and Augmented Reality (ISMAR), 2010 9th IEEE International Symposium on, pages 247–248. IEEE, 2010. [15] O. Miksik, V. Vineet, M. Lidegaard, R. Prasaath, M. Nießner, S. Golodetz, S. L. Hicks, P. Perez, S. Izadi, and P. H. S. Torr. The semantic paintbrush: Interactive 3d mapping and recognition in large outdoor spaces. Proceedings of the 33nd annual ACM conference on Human factors in computing systems (CHI), 2015. [16] A. Newson, A. Almansa, M. Fradet, Y. Gousseau, and P. P´erez. Video inpainting of complex scenes. SIAM Journal on Imaging Sciences, 7(4):1993–2019, 2014. [17] B. Nuernberger, K.-C. Lien, T. H¨ollerer, and M. Turk. Interpreting 2d gesture annotations in 3d augmented reality. In 2016 IEEE Symposium on 3D User Interfaces (3DUI), pages 149–158, March 2016. [18] O. Oda and S. Feiner. 3d referencing techniques for physical objects in shared augmented reality. In 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 207–215, Nov 2012. [19] A. Olwal, H. Benko, and S. Feiner. Senseshapes: Using statistical ge- ometry for object selection in a multimodal augmented reality system. In Proceedings of the 2Nd IEEE/ACM International Symposium on Mixed and Augmented Reality, ISMAR ’03, pages 300–, Washington, DC, USA, 2003. IEEE Computer Society. [20] S. D. Rane, G. Sapiro, and M. Bertalmio. Structure and texture filling- in of missing image blocks in wireless transmission and compression applications. IEEE Transactions on image processing, 12(3):296–303, 2003. [21] J. Shi and C. Tomasi. Good features to track. In Computer Vision and Pattern Recognition, 1994. Proceedings CVPR’94., 1994 IEEE Computer Society Conference on, pages 593–600. IEEE, 1994. [22] S. Siltanen. Diminished reality for augmented reality interior design. The Visual Computer, pages 1–16, 2015. [23] S. Siltanen, H. Sarasp, and J. Karvonen. [demo] a complete interior design solution with diminished reality. In 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 371– 372, Sept 2014. [24] G. Turk and M. Levoy. The stanford bunny, 1993.