Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

行動認識手法の論文・ツール紹介

1,206 views

Published on

行動認識に関連する論文である
"Learning Spatiotemporal Features with 3D Convolutional Networks"
について紹介します.

Published in: Software
  • Be the first to comment

行動認識手法の論文・ツール紹介

  1. 1. 0 D3
  2. 2. 1
  3. 3. 2 K. Soomro et al. UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild , CRCV-TR-12-01, 2012.
  4. 4. 3 RGB Optical Flow CNN K. Simonyan and A. Zisserman Two-Stream Convolutional Networks for Action Recognition in Videos , NIPS, 2014.
  5. 5. UCF101 4 2012 101 13320 YouTube K. Soomro et al. UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild , CRCV-TR-12-01, 2012.
  6. 6. hand-waving HMDB51 5 2011 51 6766 YouTube Web H. Kuehne et al. HMDB: A Large Video Database for Human Motion Recognition ,ICCV, 2011. drinking sword fighting diving running kicking
  7. 7. State-of-the-art 6 UCF101 95% Z. Lan et al. Deep Local Video Feature for Action Recognition , arXiv, 2017. Deep
  8. 8. 7
  9. 9. 8 Learning Spatiotemporal Features with 3D Convolutional Networks D. Tran et al. (FAIR, Dartmouth College) ICCV 2015, Poster http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Tran_Learning_Spatiotemporal_Features_ICCV_2015_paper.pdf
  10. 10. 9 CNN 3D CNN C3D 3D
  11. 11. 3D 10 3D Conv: 3×3×3 with stride 1, Pool: 1×2×2 (pool1), 2×2×2 (Pool2-5) Two-stream CNN C3D
  12. 12. 1 11 Sports-1M 100 487 2 5 128×171 16×112×112 50%
  13. 13. 2 12 30 0.003 150K iterations 1/2
  14. 14. C3D 13 16 8 C3D fc6 ... fc6 fc6 fc6 fc6 fc6 fc6 fc6 fc6 C3D feat Average
  15. 15. 14 UCF101 3 C3D I380K Sports-1M I380K Sports-1M Fine Tuning C3D SVM
  16. 16. 15 C3D iDT
  17. 17. 16 iDT Brox s (Optical Flow)
  18. 18. 17
  19. 19. http://www.cs.dartmouth.edu/~dutran/c3d/ Github Caffe https://github.com/facebook/C3D Sports-1M Pre-training https://goo.gl/IG5CL0
  20. 20. 19 Caffe git clone Makefile.config make all, make test, make runtest C3D_HOME/examples/c3d_feature_extraction
  21. 21. 20 C3D_HOME/examples/c3d_feature_extraction c3d_sport1m_feature_extraction_frm.sh, c3d_sport1m_feature_extraction_video.sh output/c3d/... OK
  22. 22. 21 output/c3d/video_name 000000.fc6-1, 000016.fc6-1, ..., 000000.fc7-1, ..., 000000.prob, ... fc6 fc7 MATLAB script/read_binary_blob.m
  23. 23. 22 (int32) num, channel, length, height, width (float)
  24. 24. Python 23 import struct import numpy as np with open(‘000000.fc6-1’, ‘rb’) as input_file sizes = [struct.unpack(‘i’, input_file.read(4))[0] for i in range(5)] m = np.prod(sizes) data = [struct.unpack(‘f’, input_file.read(4))[0] for i in range(m)]
  25. 25. 24 mkdir -p output/c3d/v_ApplyEyeMakeup_g01_c01 mkdir -p output/c3d/v_BaseballPitch_g01_c01 GLOG_logtosterr=1 ../../build/tools/extract_image_features.bin prototxt/c3d_sport1m_feature_extractor_video.prototxt conv3d_deepnetA_sport1m_iter_1900000 0 50 1 prototxt/output_list_video_prefix.txt fc7-1 fc6-1 prob
  26. 26. 25 extract_image_features.bin <feature_extractor_prototxt_file> <c3d_pre_trained_model> <gpu_id> <mini_batch_size> <number_of_mini_batches> <output_prefix_file> <feature_name1> <feature_name2> ...
  27. 27. 26 extract_image_features.bin <feature_extractor_prototxt_file> <c3d_pre_trained_model> <gpu_id> <mini_batch_size> <number_of_mini_batches> <output_prefix_file> <feature_name1> <feature_name2> ...
  28. 28. 27 extract_image_features.bin <feature_extractor_prototxt_file> <c3d_pre_trained_model> <gpu_id> <mini_batch_size> <number_of_mini_batches> <output_prefix_file> <feature_name1> <feature_name2> ...
  29. 29. 28 GPU ID extract_image_features.bin <feature_extractor_prototxt_file> <c3d_pre_trained_model> <gpu_id> <mini_batch_size> <number_of_mini_batches> <output_prefix_file> <feature_name1> <feature_name2> ...
  30. 30. 29 extract_image_features.bin <feature_extractor_prototxt_file> <c3d_pre_trained_model> <gpu_id> <mini_batch_size> <number_of_mini_batches> <output_prefix_file> <feature_name1> <feature_name2> ...
  31. 31. 30 = × extract_image_features.bin <feature_extractor_prototxt_file> <c3d_pre_trained_model> <gpu_id> <mini_batch_size> <number_of_mini_batches> <output_prefix_file> <feature_name1> <feature_name2> ...
  32. 32. 31 extract_image_features.bin <feature_extractor_prototxt_file> <c3d_pre_trained_model> <gpu_id> <mini_batch_size> <number_of_mini_batches> <output_prefix_file> <feature_name1> <feature_name2> ...
  33. 33. 32 (fc6, fc7, prob ) extract_image_features.bin <feature_extractor_prototxt_file> <c3d_pre_trained_model> <gpu_id> <mini_batch_size> <number_of_mini_batches> <output_prefix_file> <feature_name1> <feature_name2> ...
  34. 34. input_list_video.txt 33 input/avi/v_ApplyEyeMakeup_g01_c01.avi 0 0 input/avi/v_ApplyEyeMakeup_g01_c01.avi 16 0 input/avi/v_ApplyEyeMakeup_g01_c01.avi 32 0 input/avi/v_BaseballPitch_g01_c01.avi 0 0 input/avi/v_BaseballPitch_g01_c01.avi 16 0
  35. 35. output_list_video_prefix.txt 34 input_list output/c3d/v_ApplyEyeMakeup_g01_c01/000000 output/c3d/v_ApplyEyeMakeup_g01_c01/000016 output/c3d/v_ApplyEyeMakeup_g01_c01/000032 output/c3d/v_BaseballPitch_g01_c01/000000 output/c3d/v_BaseballPitch_g01_c01/000016
  36. 36. 35 200 1.3 467s 17s 2s
  37. 37. 36 User Guide https://goo.gl/4WXrMY Matlab, Python
  38. 38. 37 3D CNN GPU

×