More Related Content

Slideshows for you(20)

Similar to ISM2014(20)



  1. Unsupervised Visual Domain Adaptation Using Auxiliary Information in Target Domain Masaya Okamoto and Hideki Nakayama Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan © The University of Tokyo 1
  2. Outline • Background • Related work • Proposed method • Experiments • Conclusion • Future work © The University of Tokyo 2
  3. Background • A lot of hand labeled data is necessary for image recognition – PASCAL VOC2012: 11,530 labeled images • It’s so tough work to label images – Lack of hand labeled data • Many labeled (tagged) images in web – We can’t use web images directly Example images of PASCAL VOC2012 © The University of Tokyo Domain Adaptation 3
  4. Domain Adaptation Learn Test TestLearn Learn Test TestLearn Learning from other domain ※From CVPR 2012 Tutorial on Domain Transfer Learning for Vision Applications © The University of Tokyo 4
  5. Source and Target Source Domain Target Domain Learn TestCup Cup Cup CupCupCup Cup Cup Cup Many labeled samples Few labeled samples© The University of Tokyo 5
  6. Difficulty of domain adaptation • Simple methods don’t work in other situation © The University of Tokyo 6 (average of 31 classes) From 「Adapting visual category models to new domains」 K. Saenko…
  7. Related work • Semi-supervised domain adaptation – It assume few labeled examples in target domain – Saenko et al. [1] [ECCV 2010] • First work on visual domain adaptation • Unsupervised domain adaptation – No labeled example is used in target domain – Preferable but quite difficult – Gong et al. [4] [CVPR 2012] – Fernando et al. [5][ICCV 2013] © The University of Tokyo 7
  8. Subspace based method • Generate “virtual” domains that blend the properties of source and target • Geodesic flow sampling (GFS) by Gopalan et al. – Generates multiple subspaces by sampling points from the geodesic flow on the Grassmann manifold © The University of Tokyo 8 From 「Domain Adaptation for Object Recognition: An Unsupervised Approach」 R. Gopalan …
  9. Subspace based method • Geodesic flow Kernel (GFK) by Gong et al. – Analytic solution of sampling based approach • Subspace based approach is probably the current most successful approach © The University of Tokyo 9 From 「Geodesic Flow Kernel for Unsupervised Domain Adaptation」 B. Gong …
  10. © The University of Tokyo 10 • To make source domain semantic distribution, applying PLS with labels • [Problem] Can’t apply PLS to target because of lack of cues like labels Subspace based method Target subspace Source subspace Cup Monitor
  11. Our core Idea • Previous works on domain adaptation use only visual information in target domain • Use subsidiary non-visual data as semantic cues in subspace based methods – Such as Depth, location data (GPS), gyroscopes … © The University of Tokyo 11 Lack of semantic information in target subspace
  12. Proposed Method • Using PLS instead of PCA for generating source subspace improved the performance [4] • We propose the method using PLS for generating target subspace – Use subsidiary information as predicted variables – Our method improve the distribution of data in target subspace © The University of Tokyo 12
  13. Difference between ours and others © The University of Tokyo Target subspace Source subspace Source :A lot of labeled images Target : A lot of unlabeled Source :A lot of labeled images Target :A lot of unlabeled and subsidiary signal Target subspace Source subspace Original GFK or SA Our work Cup Monitor Cup Monitor 13
  14. © The University of Tokyo Target subspace Source subspace Cup Cup Monitor Monitor Monitor Cup Source images with labels Target images with subsidiary info. 14
  15. © The University of Tokyo Source subspace Cup Cup Monitor Monitor Monitor Cup Target subspace 1. PLS in source subspace 15
  16. © The University of Tokyo Source subspace Cup Cup Monitor Monitor Monitor Cup Target subspace 2. PLS in target subspace 16
  17. © The University of Tokyo Source subspace Cup Cup Monitor Monitor Monitor Cup Target subspace 3. Subspace based domain adaptation 17
  18. Experiments Settings • Use distance feature as subsidiary information – Extract depth feature applying depth kernel descriptors(Bo et al.)[10] – Obtained 14000-dim distance features for each image • Change the number of source samples – 120, 300, 1600, 1800 and 3000 samples • Chose best subspace dim from 10, 20, 30, 40 or 50 for each case © The University of Tokyo 18
  19. Experiments Settings • B3DO[8] as the target domain data – Evaluate classification accuracy of 6 classis © The University of Tokyo 19 RGB Image Depth Image (Subsidiary information)
  20. Number of samples • Source: ImageNet Target: B3DO [8] Class ImageNet(Source) B3DO(Target) Bottle 920 238 Bowl 919 142 Cup 919 258 Keyboard 1512 129 Monitor 1134 243 Sofa 982 109 SUM 6386 1119 AVG 1064.3 186.5 ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, J. Deng… © The University of Tokyo 20
  21. Difference in dataset Class: Cup Source: ImageNet Target: B3DO © The University of Tokyo 21
  22. Experiments settings • Test 2 subspace based methods for proving that our method improve performance constantly ① Geodesic Flow Kernel (GFK)[4] ② Subspace Alignment (SA)[5] • Compare4 methods 1. Our method 1 (Source: PCA -> Target: PLS) 2. Baseline 1 (Source: PCA -> Target: PCA) 3. Our method 2 (Source: PLS -> Target: PLS) 4. Baseline 2 (Source: PLS -> Target: PCA) © The University of Tokyo 22
  23. Experimental result(GFK) • Geodesic Flow Kernel(GFK) [4] as subspace based method Num of samples OURS1 Baseline1 OURS2 Baseline2 120 28.33 28.95 32.35 31.64 300 29.31 29.85 32.71 31.55 600 29.04 28.60 32.53 28.87 1800 32.17 30.92 34.32 31.81 3000 33.42 31.72 34.94 33.92 © The University of Tokyo 23
  24. Result graph of GFK [4] © The University of Tokyo 24
  25. Num of samples OURS1 Baseline1 OURS2 Baseline2 120 34.05 29.85 34.23 30.83 300 33.15 30.21 32.17 31.90 600 33.78 33.15 33.33 32.71 1800 33.15 30.21 32.17 31.90 3000 34.85 32.44 33.69 32.89 Experimental result(SA) • Subspace Alignment(SA) [4] as subspace based method © The University of Tokyo 25
  26. Result graph of SA [5] © The University of Tokyo 26
  27. Accuracy and exec. time • Classification accuracy and average execution time when use 20 source Images each class • Proposed methods take slightly more calculation costs OUR1 Baseline1 OUR2 Baseline2 GFK 28.33 28.95 32.35 31.64 Exec. Time 3.83s 2.26s 135.17s 128.03s SA 34.05 29.85 34.23 30.83 Exec. Time 3.07s 0.98s 130.90s 120.30s © The University of Tokyo 27
  28. Conclusion • Proposed methods are better than previous ones using only visual information • Subsidiary information can improve the domain adaptation accuracy – Constantly improved on two independent methods • As far as we know, this is the first visual domain adaptation method using non-visual information in target domain © The University of Tokyo 28
  29. Future work • Handling and testing other multimedia information such as Gyroscope or Sound • Extensive experiments – Now focus only 6 classes – Testing other classes, other subspace based methods © The University of Tokyo 29
  30. Contacts • Masaya Okamoto • Nakayama Lab., the University of Tokyo • e-mail: 謝謝! © The University of Tokyo 30
  31. Reference (1/2) [1] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” in Proc. of ECCV, 2010. [2] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon,“Information-theoretic metric learning,” in Proc. of ICML,2007. [3] R. Gopalan, R. Li, and R. Chellappa, “Domain adaptation for object recognition: an unsupervised approach,” in Proc. of ICCV, 2011. [4] B. Gong, Y. Shi, and F. Sha, “Geodesic flow kernel for unsupervised domain adaptation,” in Proc. of CVPR, 2012. [5] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, “Unsupervised visual domain adaptation using subspace alignment,” in Proc. of ICCV, 2013. © The University of Tokyo 31
  32. Reference (2/2) [6] H. Wold, S. Kotz, and N. L. Johnson, “Partial least squares,” in Encyclopedia of Statistical Sciences, 1985. [7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei, “Imagenet: a large-scale hierarchical image database,” in Proc. of CVPR, 2009. [8] A. Janoch, S. Karayev, Y. Jia, J. Barron, M. Fritz, K. Saenko, and T. Darrell, “A category-level 3-d object dataset: putting the kinect to work,” in Proc. of ICCV Workshop on Consumer Depth Cameras in Computer Vision, 2011. [9] D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. [10] L. Bo, X. Ren, and D. Fox, “Depth kernel descriptors for object recognition,” in Proc. of IROS, 2011. © The University of Tokyo 32
  33. Why use Depth as subsidiary info ? ① Easy to collect • Some publicly-available datasets (like B3DO) ② Easier situation (We guess) • Depth information may have strong correlation with classes ③ Depth sensors will be used in wearable devices • 「Project Tango」 -Google (Smartphone have Kinect- like camera) © The University of Tokyo 33
  34. • The system doesn’t need labeled samples from user • Better than using only visual information – Using subsidiary info makes result better System Overview RecognitionTarget Distance features (Depth Images) WEB Class:Chair Source © The University of Tokyo 34
  35. Life logging • Life logging system are spreading • Much subsidiary information ( Sound, Gyro …) • →Different situation from previous works • In nearly feature, the situation is expected become natural © The University of Tokyo 35
  36. Experimental process flow • PLS to Source (Jack-knifing) – Because dimensions of predictive signals are low – Iteration process, High computational cost • PLS to Target (Traditional) – Because predictive signals have enough dimensions (14000-dim) • Subspace based method – GFK or SA © The University of Tokyo 36

Editor's Notes

  1. Hello, everyone. My name is Masaya Okamoto. I’m from the University of Tokyo, Japan. ,I’m glad to be here. , I’ll talk about “Unsupervised Visual Domain Adaptation Using Auxiliary Information in Target Domain”. (オグジルリ)(サブシディアリ)
  2. This is outline of my talk. The first, I’ll speak the background of our research. The second, I’ll mention about previous visual domain adaptation works and difference between it and ours. Next, I will explain the core idea and details of proposed method. And then, I will talk about experiments and its results. Finally, I’ll speak about conclusion and feature work.
  3. recently, image recognition systems need many hand labeled images for training. For example, PASCAL VOC 2012 used over 10 thousands labeled images. (ツーサウザンドトゥエルブ) We suffer lack of hand labeled images because labeling by hand is tough work. ,,On the other hand, There are many labeled images in web. But we cant use web these images directly. Therefor domain adaptation techniques has gathered more and more attention.
  4. (Click) This figure shows overview of domain adaptation. (Click) Domain adaptation is learning one domain images and testing other domain images. As you see, It is learning from the images that have different characteristic.
  5. The domain where a classifier is trained is called the “source domain” and is expected to provide a lot of labeled data. The domain in which the classifier is actually tested is called the “target domain” and is assumed to have different characteristics such as illumination and resolution, from the source domain. This Figure shows an example of the difference between two domains.
  6. I’ll explain the difficulty of domain adaptation. , This is the result from a previous work. This table shows the classification scores of averages of 31 classes. ,, If classifier was trained and tests in same domain, as upper side of the table, the classifier achieve the good score.,, But, If classifier was trained in one domain and tests other domain, as lower side. The classifiers like support vector machine or Naive-Bayes Nearest Neighbor don’t work well.
  7. there are many visual domain adaptation so far,, Saenko et al. proposed the first work on domain adaptation for image recognition in 2012. , It was semi-supervised domain adaptation that assume few labeled examples in target domain. (Click) After that, Gong et al., Fernando et al and more proposed several works as unsupervised visual domain adaptation. These don’t need labeled sample in target domain. Considering that our objective is to reduce the cost of manual labeling, an unsupervised setting is the ultimate goal of domain adaptation, but it is very difficult task. (Click) We focus unsupervised domain adaptation setting.
  8. In follow slides,, I will explain the previous works of subspace based domain adaptation method., Current, the subspace based approach like these has been known to be a promising strategy for unsupervised domain adaptation. ,,Subspace based methods generate “virtual” domains that blend the properties of source and target. The first work of subspace based method was proposed by Gopalan et al. as Geodesic flow sampling. , For short GFS. , First of all, ,GFS generates subspaces for source and target domains respectively. ,, Next, It generates multiple intermediate subspaces between source and target ones by sampling points from the geodesic flow on the Grassmann manifold. ,One problem of GFS is the trade-off between performance , and the dimensions of feature vectors that depend on a number of sampled intermediate subspaces. ,, In other words, to improve the performance, we need to take more intermediate subspaces, but this results in higher computational costs. ,, Some methods relax this problem.
  9. One of these methods is Geodesic flow kernel. For short GFK. It was proposed by gong et al. GFK is analytic solution of sampling based approach. Current, the subspace based approach like these has been known to be a promising strategy for unsupervised domain adaptation.
  10. The first step of subspace based methods is generating source and target subspaces. ,, In considering following processes and “virtual” intermediate domain, each subspace have to be semantic distribution. ,, In previous works, , To make source subspace semantic distribution, applying partial least squares analysis with labels. But, we cant generate semantic distribution in target because target domain doesn’t have semantic cues like labels.
  11. In this slide shows the core idea of our method. Previous works on visual domain adaptation use only visual information in target domain.(Click) So, we suffered lack of semantic information in target subspace. In our opinion, we have to exploit subsidiary data for more improvement.(Click) Thus, we propose the method using non-visual data such as distance or location or gyroscopes information As semantic cues.
  12. Actually, From previous work, we knew the knowledge that applying partial least squares instead of principal components analysis for generating source subspace improved domain adaptation performance. From now on, PLS means partial least squares, PCA means principal components analysis. Based the knowledge, proposed method apply PCA instead of PCA to target subspace. our method improve the distribution of data in target subspace using subsidiary information as cues.
  13. The figures shows difference between ours and other unsupervised domain adaptation. Source domain have large number of labeled images. Our work assume no labeling on target domain like other works. But subsidiary signals are provided. We emphasize that subsidiary signal are provided in only target subspace. Thus, our method don’t do simple expanding features for performance. サブスペースベースのレビュー サンプルするやつ ↓ GFK(解析会) 最新の手法を解説 ターゲットはセマンティックになってないよね。
  14. Let me talk about process flow of proposed method. This picture is illustration of our method. Left side of this figure express source subspace. All source images have class labels. Right side express target subspace. All target images have not labels but subsidiary information.
  15. At first, Applying partial least squares analysis to source domain using class labels as predictive values.
  16. At second, Applying partial least squares analysis to target domain using subsidiary information as predictive values. Thus, we also make target domain several semantic distribution. Subsidiary information is used for only this process.
  17. Finally, Apply subspace based domain adaptation. We improve previous method by creating semantic distribution both source and target domains.
  18. let me mention about experiments (フィーチャーズ) We used distance features as subsidiary information. The features extracted by depth kernel descriptors proposed by bo et al. Actually, we obtained a 14000 dimensional feature from each depth image. ,We changed the numbers of source samples from 20 to 500 per class In total, 120 to 3000 samples. ,We experimentally chose dimensions of subspaces among 10, 20, 30, 40, and 50 that maximize the classification accuracy for each case because fixed dimensions may bias a particular method to work better.
  19. We used B3DO dataset from “A category-level 3-d object dataset: putting the kinect to work”. B3DO is publically available rgb-d dataset proposed by janoch et al. This figure shows the examples of B3DO dataset. The rgb-image and depth image pairs are provided.
  20. This table shows the number of source and target images. Source images obtained from ImageNet and target from B3DO dataset. All images were cropped.
  21. This figure shows the actually difference of experiment dataset. This is cup class. As you see, there are a lot of difference such as lighting or resolution, background.
  22. As based method, To prove that proposed method improve performance constantly, We exploit 2 independent state of the art subspace based domain adaptation methods. First one is Geodesic flow kernel Second is subspace alignment. ,To evaluate performance of our method, we compared 4 kind of methods. The first one is proposed method1 applying PCA to source and PLS to target. The second is baseline1 PCA to both source and target. The third one is proposed method2 applying PLS to source and target. The fourth is baseline2 PLS to source and PCA to target. ,(Click),The comparison of our method1 and Baseline1 illustrates the effectiveness of our approach when PCA was used for building the source subspace. ,(Click),Similarly, our method2 and Baseline2 are comparable when PLS was used in the source domain. We expected to observe the respective improvements in each case.
  23. This table shows the results when use gfk method as base. OURS2 was the best in every case. グラフだけでよいかも(時間がオーバーする場合は削除する) 中心でわけて比較する
  24. This figure shows the result of experiments on Geodesic flow kernel method. Red and Blue lines are proposed methods. In this case, blue line, our method 2 that applying PLS to both source and target subspaces was the best.
  25. This table shows the results when use subspace alignment method as base. Our method1 was the best in every case.
  26. This figure shows the result of experiments on Subspace alignment method. In this case, blue line, our method 1 that applying PLS to target and PCA to source subspace was the best.
  27. In this slide, we mention about execution time of each methods. ,, Exec time in table show the average execution time. Proposed method take more calculation time than baselines. About 2 seconds in cases that applied PCA to source. , About 10 seconds in cases applied PLS. But we think it is acceptable because extra calculation time was negligible especially case applied PLS to source domain.(ネグリジブル)
  28. Let me talk about conclusion, Proposed methods using non-visual info additionally on target space are better than previous ones. We emphasize again that subsidiary signal are provided in only target domain, And our method don’t do simple expanding features for performance. We showed that Subsidiary information can improve the domain adaptation accuracy. The result of experiments shows that our method is effective and valid Because our method improved the performance on two independent state of the art subspace based methods constantly. Next, We proposed new domain adaptation task that assuming target domain have some subsidiary non-visual information. And this is the first method using non-visual information.
  29. For the future work, The first one is handling and testing other multimodal information Such as gyroscope or sound obtained when a picture was taken. The second one is expanding experiments. We have to test more classes and subspace based methods. I think you are right about (topic/information), ~ It is a problem of our methods, it is feature work.
  30. Thank you very much. I’m sorry, I don’t have the information about it now. But I guess Is your question about ~ (section/figure). Actually, I cant answer your question, but I guess that ~ This is difficult to explain, but I’d be pleased to talk it later. Sorry, but that is outside the area of this study. Does that answer your question?
  31. There are three reasons. At first, It is easy to collect. There are some publicly available dataset like B3DO. At second, we think distance information make a problem easier Because distance features may have stronger correlation with classes than location or sounds. At third, Depth sensors will be used in wearable devices. Google anounced project tango that make the smartphone havebuilt in kinect like camera. That’s why we choose distance infromation as subsidiary information.
  32. At first step is applying jack-knifing PLS to source domain. Labels as predictive signal in source domain don’t have enough dimensions. It is iteration process and high computational cost. At second is applying normal PLS to target space by solving a eigenvalue problem. It is low computational cost. Distance features as predictive signals in target domain have enough dimensions. At third is applying subspace based methods, experimentally GFK or SA.
  33. Our objective is Summarizing egocentric moving videos for generating walking route guidance video. A raw video is too long to watch because it is as long as walking in route. It’s difficult to use route guide in off course. To use route guidance, our system summaries it automatically.
  34. In this slide shows that overview of our method Our system consists of 3 steps First step is generating source and target subspace for dimensions reduction Second is Third step is From the next slide, I’ll explain the detail of each step
  35. In this study, we focus unsupervised domain adaptation setting. Previous works used only visual information for domain adaptation in target domain. In our option, this is cause of domain adaptation difficulty. So, we propose new domain adaptation task that with subsidiary information and propose its the first method.