Hello, everyone. My name is Masaya Okamoto. I’m from the University of Tokyo, Japan.
,I’m glad to be here. , I’ll talk about “Unsupervised Visual Domain Adaptation Using Auxiliary Information in Target Domain”.
This is outline of my talk.
The first, I’ll speak the background of our research.
The second, I’ll mention about previous visual domain adaptation works and difference between it and ours.
Next, I will explain the core idea and details of proposed method.
And then, I will talk about experiments and its results.
Finally, I’ll speak about conclusion and feature work.
recently, image recognition systems need many hand labeled images for training.
For example, PASCAL VOC 2012 used over 10 thousands labeled images.
We suffer lack of hand labeled images because labeling by hand is tough work.
,,On the other hand, There are many labeled images in web. But we cant use web these images directly.
Therefor domain adaptation techniques has gathered more and more attention.
(Click) This figure shows overview of domain adaptation. (Click)
Domain adaptation is learning one domain images and testing other domain images.
As you see, It is learning from the images that have different characteristic.
The domain where a classiﬁer is trained is called the “source domain” and is expected to provide a lot of labeled data.
The domain in which the classiﬁer is actually tested is called the “target domain” and is assumed to have different
characteristics such as illumination and resolution, from the source domain.
This Figure shows an example of the difference between two domains.
I’ll explain the difficulty of domain adaptation. , This is the result from a previous work.
This table shows the classification scores of averages of 31 classes. ,, If classifier was trained and tests in same domain, as upper side of the table,
the classifier achieve the good score.,, But, If classifier was trained in one domain and tests other domain, as lower side.
The classifiers like support vector machine or Naive-Bayes Nearest Neighbor don’t work well.
there are many visual domain adaptation so far,, Saenko et al. proposed the first work on domain adaptation for image recognition in 2012. , It was semi-supervised domain adaptation that assume few labeled examples in target domain. (Click)
After that, Gong et al., Fernando et al and more proposed several works as unsupervised visual domain adaptation.
These don’t need labeled sample in target domain. Considering that our objective is to reduce the cost of manual labeling,
an unsupervised setting is the ultimate goal of domain adaptation, but it is very difﬁcult task. (Click) We focus unsupervised domain adaptation setting.
In follow slides,, I will explain the previous works of subspace based domain adaptation method., Current, the subspace based approach like these has been known to be a promising strategy for unsupervised domain adaptation. ,,Subspace based methods generate “virtual” domains that blend the properties of source and target.
The first work of subspace based method was proposed by Gopalan et al. as Geodesic flow sampling. , For short GFS. , First of all, ,GFS generates subspaces for source and target domains respectively. ,, Next, It generates multiple intermediate subspaces between source and target ones by sampling points from the geodesic flow on the Grassmann manifold. ,One problem of GFS is the trade-off between performance , and the dimensions of feature vectors that depend on a number of sampled intermediate subspaces. ,, In other words, to improve the performance, we need to take more intermediate subspaces, but this results in higher computational costs. ,, Some methods relax this problem.
One of these methods is Geodesic flow kernel. For short GFK.
It was proposed by gong et al.
GFK is analytic solution of sampling based approach.
Current, the subspace based approach like these has been known to be a promising strategy for unsupervised domain adaptation.
The first step of subspace based methods is generating source and target subspaces. ,, In considering following processes and “virtual” intermediate domain, each subspace have to be semantic distribution. ,, In previous works, , To make source subspace semantic distribution, applying partial least squares analysis with labels.
But, we cant generate semantic distribution in target because target domain doesn’t have semantic cues like labels.
In this slide shows the core idea of our method. Previous works on visual domain adaptation use only visual information in target domain.(Click)
So, we suffered lack of semantic information in target subspace.
In our opinion, we have to exploit subsidiary data for more improvement.(Click)
Thus, we propose the method using non-visual data such as distance or location or gyroscopes information
As semantic cues.
From previous work, we knew the knowledge that applying partial least squares instead of principal components analysis for generating source subspace improved domain adaptation performance.
From now on, PLS means partial least squares, PCA means principal components analysis.
Based the knowledge, proposed method apply PCA instead of PCA to target subspace.
our method improve the distribution of data in target subspace using subsidiary information as cues.
The figures shows difference between ours and other unsupervised domain adaptation.
Source domain have large number of labeled images.
Our work assume no labeling on target domain like other works.
But subsidiary signals are provided.
We emphasize that subsidiary signal are provided in only target subspace.
Thus, our method don’t do simple expanding features for performance.
Let me talk about process flow of proposed method.
This picture is illustration of our method.
Left side of this figure express source subspace.
All source images have class labels.
Right side express target subspace.
All target images have not labels but subsidiary information.
Applying partial least squares analysis to source domain using class labels as predictive values.
Applying partial least squares analysis to target domain using subsidiary information as predictive values.
Thus, we also make target domain several semantic distribution.
Subsidiary information is used for only this process.
Apply subspace based domain adaptation.
We improve previous method by creating semantic distribution both source and target domains.
let me mention about experiments
We used distance features as subsidiary information.
The features extracted by depth kernel descriptors proposed by bo et al.
Actually, we obtained a 14000 dimensional feature from each depth image.
,We changed the numbers of source samples from 20 to 500 per class
In total, 120 to 3000 samples.
,We experimentally chose dimensions of subspaces among 10, 20, 30, 40, and 50 that
maximize the classiﬁcation accuracy for each case
because ﬁxed dimensions may bias a particular method to work better.
We used B3DO dataset from “A category-level 3-d object dataset: putting the kinect to work”.
B3DO is publically available rgb-d dataset proposed by janoch et al.
This figure shows the examples of B3DO dataset.
The rgb-image and depth image pairs are provided.
This table shows the number of source and target images.
Source images obtained from ImageNet and target from B3DO dataset.
All images were cropped.
This figure shows the actually difference of experiment dataset.
This is cup class.
As you see, there are a lot of difference such as lighting or resolution, background.
As based method,
To prove that proposed method improve performance constantly,
We exploit 2 independent state of the art subspace based domain adaptation methods.
First one is Geodesic flow kernel
Second is subspace alignment.
,To evaluate performance of our method, we compared 4 kind of methods.
The first one is proposed method1 applying PCA to source and PLS to target.
The second is baseline1 PCA to both source and target.
The third one is proposed method2 applying PLS to source and target.
The fourth is baseline2 PLS to source and PCA to target.
,(Click),The comparison of our method1 and Baseline1 illustrates the effectiveness of our approach when PCA was used for building the source subspace.
,（Click),Similarly, our method2 and Baseline2 are comparable when PLS was used in the source domain.
We expected to observe the respective improvements in each case.
This table shows the results when use gfk method as base.
OURS2 was the best in every case.
This figure shows the result of experiments on Geodesic flow kernel method.
Red and Blue lines are proposed methods.
In this case, blue line, our method 2 that applying PLS to both source and target subspaces was the best.
This table shows the results when use subspace alignment method as base.
Our method1 was the best in every case.
This figure shows the result of experiments on Subspace alignment method.
In this case, blue line, our method 1 that applying PLS to target and PCA to source subspace was the best.
In this slide, we mention about execution time of each methods. ,, Exec time in table show the average execution time.
Proposed method take more calculation time than baselines.
About 2 seconds in cases that applied PCA to source. , About 10 seconds in cases applied PLS.
But we think it is acceptable because extra calculation time was negligible especially case applied PLS to source domain.（ネグリジブル）
Let me talk about conclusion,
Proposed methods using non-visual info additionally on target space are better than previous ones.
We emphasize again that subsidiary signal are provided in only target domain,
And our method don’t do simple expanding features for performance.
We showed that Subsidiary information can improve the domain adaptation accuracy.
The result of experiments shows that our method is effective and valid
Because our method improved the performance on two independent state of the art subspace based methods constantly.
Next, We proposed new domain adaptation task that assuming target domain have some subsidiary non-visual information.
And this is the first method using non-visual information.
For the future work,
The first one is handling and testing other multimodal information
Such as gyroscope or sound obtained when a picture was taken.
The second one is expanding experiments.
We have to test more classes and subspace based methods.
I think you are right about (topic/information), ~
It is a problem of our methods, it is feature work.
Thank you very much.
I’m sorry, I don’t have the information about it now. But I guess
Is your question about ~ (section/figure).
Actually, I cant answer your question, but I guess that ~
This is difficult to explain, but I’d be pleased to talk it later.
Sorry, but that is outside the area of this study.
Does that answer your question?
There are three reasons.
At first, It is easy to collect.
There are some publicly available dataset like B3DO.
At second, we think distance information make a problem easier
Because distance features may have stronger correlation with classes than location or sounds.
Depth sensors will be used in wearable devices.
Google anounced project tango that make the smartphone havebuilt in kinect like camera.
That’s why we choose distance infromation as subsidiary information.
At first step is applying jack-knifing PLS to source domain.
Labels as predictive signal in source domain don’t have enough dimensions.
It is iteration process and high computational cost.
At second is applying normal PLS to target space by solving a eigenvalue problem.
It is low computational cost.
Distance features as predictive signals in target domain have enough dimensions.
At third is applying subspace based methods, experimentally GFK or SA.
Our objective is Summarizing egocentric moving videos for generating walking route guidance video.
A raw video is too long to watch because it is as long as walking in route.
It’s difficult to use route guide in off course.
To use route guidance, our system summaries it automatically.
In this slide shows that overview of our method
Our system consists of 3 steps
First step is generating source and target subspace for dimensions reduction
Third step is
From the next slide, I’ll explain the detail of each step
In this study, we focus unsupervised domain adaptation setting.
Previous works used only visual information for domain adaptation in target domain.
In our option, this is cause of domain adaptation difficulty.
So, we propose new domain adaptation task that with subsidiary information and propose its the first method.