SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
21.
参考文献
21
[Belhumeur1997] Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997).
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection.
IEEE Transaction on Pattern Analysis and Machine Intelligence, 19(7), 711–720.
[Cao2012]Cao, X., Wei, Y., Wen, F., & Sun, J. (2012). Face Alignment by Explicit
Shape Regression. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
[Taigman2014] Taigman, Y., Ranzato, M. A., & Wolf, L. (2014). DeepFace: Closing
the Gap to Human-Level Performance in Face Verification. In IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
[Toshev2014] Toshev, A., & Szegedy, C. (2014). DeepPose: Human pose
estimation via deep neural networks. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR).
[Turk1991] Turk, M., & Pentland, A. (1991). Eigenfaces for Recognition. Journal of
Cognitive Neuroscienceo, 3(1), 71–86.
[Wiskott1997] Wiskott, L., Fellous, J.-M., Kruger, N., & Malsburg, C. von der.
(1997). Face recognition by elastic bunch graph matching. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 19(7), 775–779.
34.
ビジネス事例(文字検出/認識)
Evernote
画像中の文字を認識してIndex化。検索に利用
Google Goggles
ナンバープレート認識
Word Lens
https://www.youtube.com/watch?v=h2OfQdYrHRs
Googleが買収済み
34
35.
参考文献
35
[Berg2014] Berg, T., Liu, J., Lee, S. W., Alexander, M. L., Jacobs, D.
W., & Belhumeur, P. N. (2014). Birdsnap: Large-scale Fine-grained
Visual Categorization of Birds. In IEEE conference on Computer
Vision and Pattern Recognition (CVPR).
[Cheng2014] Cheng, M.-M., Zhang, Z., Lin, W.-Y., & Torr, P. (2014).
BING : Binarized Normed Gradients for Objectness Estimation at
300fps. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
[Kumar2012] Kumar, N., Belhumeur, P. N., Biswas, A., Jacobs, D.
W., Kress, W. J., Lopez, I., & Soares, J. V. B. (2012). Leafsnap: A
Computer Vision System for Automatic Plant Species
Identification. In European Conference on Computer Vision.
[LeCun1998]LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998).
Gradient-based learning applied to document recognition. In
Proceedings of the IEEE (pp. 2278–2324).
36.
参考文献
36
[Wang2012] Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., &
Li, S. (2012). Salient object detection for searched web
images via global saliency. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
[木村2012]木村昭悟, 米谷竜, 平山高嗣. (2012). “[サーベイ
論文]人間の視覚的注意の計算モデル”, 電気情報通信学会
技術報告
40.
画像を集めて三次元モデルを復元する
40
代表的なプロジェクト(リンク先にデモ動画等あり)
Photo Tourism[Snavely2006]
http://phototour.cs.washington.edu/
Building Rome in a Day[Agarwal2009]
http://grail.cs.washington.edu/rome/
Building Rome on a cloudless day [Frahm2010]
https://www.youtube.com/watch?v=4cEQZreQ2zQ
48.
参考文献
48
[Agarwal2009] Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., &
Szeliski, R. (2009). Building Rome in a day. In International
Conference on Computer Vision (pp. 72–79).
[Blanz1999] Blanz, V., & Vetter, T. (1999). A morphable model for
the synthesis of 3D faces. In Conference on Computer Graphics
and Interactive Techniques (SIGGRAPH) (pp. 187–194).
[Frahm2010] Frahm, J., Fite-georgel, P., Gallup, D., Johnson, T.,
Raguram, R., Wu, C., … Pollefeys, M. (2010). Building Rome on a
Cloudless Day. In European Conference on Computer Vision (pp.
368–381).
[Hoiem2005]Hoiem, D., & Efros, A. A. (2005). Automatic photo
pop-up. In Conference on Computer Graphics and Interactive
Techniques (SIGGRAPH).
[Narasimhan2008] Narasimhan, S. G., Koppal, S. J., & Yamazaki, S.
(2008). Temporal Dithering of Illumination. In European Conference
on Computer Vision (pp. 830–844).
49.
参考文献
49
[Pan2009] Pan, Q., Reitmayr, G., & Drummond, T. (2009).
ProFORMA: Probabilistic Feature-based On-line Rapid Model
Acquisition. Procedings of the British Machine Vision Conference
2009, (c), 112.1–112.11.
[Saxena2008]Saxena, A., Sun, M., & Ng, A. Y. (2008). Make3D:
Depth Perception from a Single Still Image. In AAAI national
conference on Artificial intelligence (pp. 1571–1576).
[Seitz1996]Seitz, S. M., & Dyer, C. R. (1996). View morphing.
Conference on Computer Graphics and Interactive Techniques
(SIGGRAPH).
[Snavely2006]Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo
tourism: exploring photo collections in 3D. In Conference on
Computer Graphics and Interactive Techniques (SIGGRAPH).
[松下2011] 松下康之. (2011). 照度差ステレオ. 情報処理学会研究
報告. voi2011-CVIM-177. 29
61.
参考文献
61
[Choi2015] Choi, W. (2015). Near-Online Multi-Target Tracking
With Aggregated Local Flow Descriptor. Proceedings of the IEEE
International Conference on Computer Vision, 3029–3037.
[Grundmann2011] Grundmann, M., Kwatra, V., & Essa, I. (2011).
Auto-directed video stabilization with robust L1 optimal camera
paths. Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, (1), 225–232.
[Hamid2010] Hamid, R., Kumar, R., Hodgins, J., & Essa, I. (2010). A
Computational Framework for Sports Visualization using Multiple
Static Cameras. In IEEE Conference on Computer Vision and
Pattern Recognition (pp. 1–14).
[Hasegawa2015] Hasegawa, K. (2015). Stroboscopic Image
Synthesis of Sports Player from Hand-Held Camera Sequence. In
International Conference on Computer Vision Workshop.
[Kalal2010] Kalal, Z. (2010). P-N Learning : Bootstrapping Binary
Classifiers by Structural Constraints. Constraints.
62.
参考文献
62
[Lu2011] Lu, W., Ting, J., Little, J. J., & Murphy, K. P. (2011).
Learning to Track and Identify Players from Broadcast
Sports Videos Shot segmentation, (December), 1–14.
[Soomro2012] Soomro, K., Zamir, A. R., & Shah, M. (2012).
UCF101: A Dataset of 101 Human Actions Classes From
Videos in The Wild. arXiv Preprint arXiv:1212.0402,
(November).
[Wang2013] Wang, H., Kläser, A., Schmid, C., & Liu, C. L.
(2013). Dense trajectories and motion boundary descriptors
for action recognition. International Journal of Computer
Vision, 103(1), 60–79.
[Zhao2014] Zhao, B., & Xing, E. P. (2014). Quasi Real-Time
Summarization for Consumer Videos. In IEEE Conference on
Computer Vision and Pattern Recognition.
73.
画像からいらない領域を切り取る
73
いらない領域を削除して、インターネット上の画像を使っ
て削除した領域の穴埋めを行う。[Hays2007]
Credit:[Hays2007]
(a) (b) (c) (d)
a. 原画像
b. 不要な領域の除去
c. 似た色と配置を持つ画像を検索
d. ユーザが選択した画像で除去した領域を補間
77.
参考文献
77
[Tomasi1998]Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and
color images. International Conference on Computer Vision (CVPR).
[Buades2005]Buades, A., Coll, B., & Morel, J.-M. (2005). A non-local algorithm for
image denoising. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
[Dabov2007]Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image
denoising by sparse 3D transform-domain collaborative filtering. IEEE
Transactions on Image Processing, 16(8), 2080–2095.
[Freeman2002]Freeman, W. T., Jones, T. R., & Pasztor, E. C. (2002). Example-
based super-resolution. Computer Graphics and Applications, 22(2), 56–65.
[Farsiu2003] Farsiu, S., Robinson, D., Elad, M., & Milanfar, P. (2003). Fast and
robust super-resolution. In IEEE International Conference on Image Processing.
[Mitzel2009] Mitzel, D., Pock, T., Schoenemann, T., & Cremers, D. (2009). Video
Super Resolution using Duality Based TV-L Optical Flow. In DAGM symposium
on Pattern Recognition (pp. 432–441).
[Yang2008]Yang, J., Wright, J., Ma, Y., & Huang, T. (2008). Image super-resolution
as sparse representation of raw image patches. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
78.
参考文献
78
[Avidan2007]Avidan, S., & Shamir, A. (2007). Seam carving for
content-aware image resizing. In Conference on Computer
Graphics and Interactive Techniques (SIGGRAPH).
[Agarwala2004]Agarwala, A., Dontcheva, M., Agrawala, M., Drucker,
S., Colburn, A., Curless, B., … Cohen, M. (2004). Interactive digital
photomontage. In Conference on Computer Graphics and
Interactive Techniques (SIGGRAPH) (Vol. 23).
[Barnes2009]Barnes, C., Shechtman, E., Finkelstein, A., & Goldman,
D. B. (2009). PatchMatch: A randomized correspondence algorithm
for structural image editing. In Conference on Computer Graphics
and Interactive Techniques (SIGGRAPH).
[Bertalmio2000]Bertalmio, M., Guillermo, S., Caselles, V., &
Ballester, C. (2000). Image inpainting. In Conference on Computer
Graphics and Interactive Techniques (SIGGRAPH) (pp. 417–424).
79.
参考文献
79
[Brown2003]Brown, M., & Lowe, D. G. (2003). Recognising
Panoramas. In International Conference on Computer Vision
(CVPR).
[Chen2009]Chen, T., Cheng, M.-M., Tan, P., Shamir, A., & Hu,
S.-M. (2009). Sketch2Photo: internet image montage. In
Conference on Computer Graphics and Interactive
Techniques (SIGGRAPH).
[Criminisi2004]Criminisi, A., Pérez, P., & Toyama, K. (2004).
Region filling and object removal by exemplar-based image
inpainting. IEEE Transactions on Image Processing : A
Publication of the IEEE Signal Processing Society, 13(9),
1200–12.
[Hays2007]Hays, J., & Efros, A. A. (2007). Scene completion
using millions of photographs. Conference on Computer
Graphics and Interactive Techniques (SIGGRAPH).
80.
参考文献
80
[Pérez2003]Pérez, P., Gangnet, M., & Blake, A. (2003).
Poisson image editing. In Conference on Computer Graphics
and Interactive Techniques (SIGGRAPH).
[Rother2004]Rother, C., Kolmogorov, V., & Blake, A. (2004).
Grabcut: Interactive foreground extraction using iterated
graph cuts. In Conference on Computer Graphics and
Interactive Techniques (SIGGRAPH).
89.
特定物体認識の仕組み
89
代表的な手法
SIFT等の局所特徴量+近似最近傍探索 [Lowe1999]
大規模なデータベースに対してはBag-of-Featuresを用いる
[Sivic2003]
Histogram of Gradient
Orientations
DB
・・・
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx x
x
マッチング+投票
94.
物体検出の特徴量(人検出他)
94
Deformable Part Model [Felzenszwalb2009]
HOG特徴を複数組み合わせることで、検出精度向上
HOG特徴を抽出する位置の歪みも含めてLatent SVMという
機械学習アルゴリズムで学習する
Credit:[Felzenszwalb2009]
Root filter Parts filter Deformation
95.
参考文献
95
[Csurka2004]Csurka, G., Dance, C. R., Fan, L., Willamowski,
J., & Bray, C. (2004). Visual categorization with bags of
keypoints. In Workshop on statistical learning in computer
vision, ECCV (Vol. 1, p. 22).
[Dalal2005]Dalal, N., & Triggs, B. (2005). Histograms of
Oriented Gradients for Human Detection. IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
[Felzenswalb2009]Felzenszwalb, P. F., Girshick, R. B.,
McAllester, D., & Ramanan, D. (2009). Object detection with
discriminatively trained part-based models. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
32(9), 1627–1645.
[Lowe1999]Lowe, D. G. (1999). Object recognition from local
scale-invariant features. In IEEE International Conference on
Computer Vision (pp. 1150–1157 vol.2).
96.
参考文献
96
[Sivic2003]Sivic, J., & Zisserman, A. (2003). Video Google: a
text retrieval approach to object matching in videos. In IEEE
Internatinal Conference on Computer Vision (CVPR).
[Viola2001]Viola, P., & Jones, M. (2001). Rapid object
detection using a boosted cascade of simple features. IEEE
International Conference on Computer Vision and Pattern
Recognition (CVPR).
116.
アプリケーション:一般物体認識
Team name Error
1 Super Vision 0.15315
2 Super Vision 0.16422
3 ISI 0.26172
4 ISI 0.26602
5 ISI 0.26646
6 ISI 0.26952
7 OXFORD_VGG 0.26979
8 XRCE/INRIA 0.27058
Team name Error
1 Super Vision 0.335463
2 Super Vision 0.341905
3 OXFORD_VGG 0.500342
4 OXFORD_VGG 0.50139
5 OXFORD_VGG 0.522189
6 OXFORD_VGG 0.529482
7 ISI 0.536474
8 ISI 0.536546
識別 検出
ILSVRC2012の結果
http://www.image-net.org/challenges/LSVRC/2012/
深層学習
117.
アプリケーション:一般物体認識
117
性能を上げるために階層が深くなる傾向
Residual Net [He2015]
152層
GoogLeNet [Szegedy2014]
22層
VGG Net [Simonyan2014]
19層
Alex Net [Krizhevsky2012]
8層
128.
参考文献
128
[Deng2009] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-
Fei, L. (2009). ImageNet: A large-scale hierarchical image database.
2009 IEEE Conference on Computer Vision and Pattern
Recognition, 2–9.
[Dong2014] Dong, C., Loy, C. C., & He, K. (2014). Image Super-
Resolution Using Deep Convolutional Networks. European
Conference on Computer Vision, 8828(c)
[Girshick2014] Girshick, R., Donahue, J., Darrell, T., & Malik, J.
(2014). Rich feature hierarchies for accurate object detection and
semantic segmentation. In IEEE Conference on Computer Vision
and Pattern Recognition.
[Girshick2015] Girshick, R. (2015). Fast R-CNN. International
Conference on Computer Vision, 1440–1448.
[He2015] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep
Residual Learning for Image Recognition. arXiv Preprint
arXiv:1512.03385, 7(3), 171–180.
129.
参考文献
129
[Iizuka2016] Iizuka, S., Simo-Serre, E., & Hiroshi, I. (2016). Let there be
Color !: Joint End-to-end Learning of Global and Local Image Priors for
Automatic Image Colorization with Simultaneous Classification. In ACM
Transactions on Graphics (SIGGRAPH),
[Krizhevsky2012]Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
ImageNet Classification with Deep Convolutional Neural Networks. In
Advances in Neural Information Processing Systems (NIPS) (pp. 1106–
1114).
[Long2014] Long, J., Shelhamer, E., & Darrell, T. (2014). Fully
Convolutional Networks for Semantic Segmentation. 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 3431–
3440.
[Radford2015] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised
Representation Learning with Deep Convolutional Generative Adversarial
Networks. arXiv, 1–15.
[Ren2015] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal Networks.
Advances in Neural Information Processing Systems (NIPS).
130.
参考文献
130
[Simonyan2014]Simonyan, K., & Zisserman, A. (2014). Very Deep
Convolutional Networks for Large-Scale Image Recognition, 1–13.
Computer Vision and Pattern Recognition.
[Simo-Serre2016] Simo-Serre, E., Iizuka, S., Kazuma, S., & Hiroshi, I.
(2016). Learning to Simplify : Fully Convolutional Networks for Rough
Sketch Cleanup. In ACM Transactions on Graphics (SIGGRAPH),
[Szegedy2014]Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., … Rabinovich, A. (2014). Going Deeper with Convolutions.
arXiv Preprint arXiv:1409.4842, 1–12.
[Taigman2014] Taigman, Y., Ranzato, M. A., & Wolf, L. (2014). DeepFace:
Closing the Gap to Human-Level Performance in Face Verification. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[Uijlings2013] Uijlings, J. R. R., Van De Sande, K. E. A., Gevers, T., &
Smeulders, A. W. M. (2013). Selective search for object recognition.
International Journal of Computer Vision, 104(2), 154–171.
[Vinyals2015] Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015).
Show and Tell: A Neural Image Caption Generator. In IEEE Conference
on Computer Vision and Pattern Recognition.
167.
参考文献
167
[Engel2014] Engel, J., Schops, T., & Cremers, D. (2014). LSD-
SLAM: Large-Scale Direct monocular SLAM. In European
Conference on Computer Vision (pp. 834–849).
[Klein2007] Klein, G., & Murray, D. (2007). Parallel tracking and
mapping for small AR workspaces. 2007 6th IEEE and ACM
International Symposium on Mixed and Augmented Reality, ISMAR.
[Newcombe2011a] Newcombe, R. A., Lovegrove, S. J., & Davison,
A. J. (2011). DTAM: Dense Tracking and Mapping in Real-Time. In
International Conference on Computer Vision (pp. 2320–2327).
[Newcombe2011b] Newcombe, R. a., Davison, A. J., Izadi, S., Kohli,
P., Hilliges, O., Shotton, J., … Fitzgibbon, A. (2011). KinectFusion:
Real-time dense surface mapping and tracking. 2011 10th IEEE
International Symposium on Mixed and Augmented Reality, 127–
136.
168.
参考文献
168
[Newcombe2015] Newcombe, R. a, Fox, D., & Seitz, S. M. (2015).
DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes
in Real-Time. Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 343–352.
[Shotton2011] Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T.,
Finocchio, M., Moore, R., … Blake, A. (2011). Real-time human
pose recognition in parts from single depth images. In IEEE
Conference on Computer Vision and Pattern Recognition.
169.
運転サポート/自動運転での活用例
169
自動運転車はセンサーの塊
GPS、LiDAR,、レーダー、ステレオカメラ、ロータリーエンコー
ダ、etc
参考:センサーで見ている世界 (7:40くらいから)
https://www.youtube.com/watch?v=tiwVMrTLUWg
https://www.google.com/selfdrivingcar
Toyota Motor Co.
178.
参考文献
178
[Banz2010] Banz, C., Hesselbarth, S., Flatt, H., Blume, H., & Pirsch,
P. (2010). Real-time stereo vision system using semi-global
matching disparity estimation: Architecture and FPGA-
implementation. Proceedings - 2010 International Conference on
Embedded Computer Systems: Architectures, Modeling and
Simulation, IC-SAMOS 2010, 93–101.
[Huval2015] Huval, B., Wang, T., Tandon, S., Kiske, J., Song, W.,
Pazhayampallil, J., … Ng, A. Y. (2015). An Empirical Evaluation of
Deep Learning on Highway Driving. arXiv, 1504.01716
[Kammel2008] Kammel, S., & Pitzer, B. (2008). Lidar-based lane
marker detection and mapping. IEEE Intelligent Vehicles
Symposium, 1137–1142.
[Scharwaechter2014] Scharwaechter, T., Enzweiler, M., Franke, U.,
& Roth, S. (2014). Stixmantics: A Medium-Level Model for Real-
Time Semantic Scene Understanding. European Conference on
Computer Vision, 8693, 533–548.
179.
参考文献
179
[Sermanet2011] Sermanet, P., & LeCun, Y. (2011). Traffic Sign
Recognition with Multi-Scale Convolutional Networks. International Joint
Conference on Neural Networks (IJCNN), 2809–2813.
[Teichman2011] Teichman, A., Levinson, J., & Thrun, S. (2011). Towards
3D object recognition via classification of arbitrary object tracks.
Proceedings - IEEE International Conference on Robotics and
Automation, 4034–4041.
[Time2008] Time, R., Detection, L., & Streets, U. (2008). Real Time Lane
Detection in Urban Streets. In IEEE Intelligent Vehicles Symposium (pp.
7–12).
[Wang2011] Wang, C., Jin, T., Yang, M., & Wang, B. (2011). Robust and
Real-Time Traffic Lights Recognition in Complex Urban Environments.
International Journal of Computational Intelligence Systems, 4(6), 1383.
[Ziegler2014] Ziegler, J., Lategahn, H., Schreiber, M., Keller, C. G.,
Knöppel, C., Hipp, J., … Stiller, C. (2014). Video Based Localization for
BERTHA. IEEE Intelligent Vehicles Symposium (IV), (Iv), 1231–1238.
194.
Web API
194
Google Cloud Vision API
一般物体認識、顔検出、表情認識、ロゴ、ランドマーク、有害
コンテンツ、文字認識
https://cloud.google.com/vision/
Microsoft Cognitive Service
顔検出、表情認識、年齢/性別認識、顔認証、一般物体認識、
アダルト画像判別、動体検知、顔追跡、動画サムネイル作成
https://www.microsoft.com/cognitive-services/
IBM Watson Visual Recognition
顔検出、年齢/性別認識、有名人認証、一般物体認識
http://www.ibm.com/smarterplanet/us/en/ibmwatson/devel
opercloud/visual-recognition.html
195.
Web API
195
PUX Developers Site
顔検出、顔認識(認証)、オブジェクト認識(特定物体認識)、
オンライン手書き文字認識
http://pux.co.jp/api_sdk/
ゼータ・ブリッジ, フォトナビ
顔検出,顔器官検出,顔属性判定(年齢、性別、笑顔)、一致
検索(特定物体認識)
http://biz.photonavi.jp/
Face++
顔検出,顔認証,顔器官検出,顔属性判定(年齢、性別、人
種、笑顔)
http://www.faceplusplus.com/