CVPRプレゼン動画100本サーベイ

米田駿介（鳥取大学）
Dhammatorn Wisan（青山学院大学）
CVPRプレゼン動画100本サーベイ
MIRU 2022 若手プログラム Group5
山﨑啓太（筑波大学）
山下優樹（東京大学）

• CV分野のトップ会議であるCVPRの
オーラルセッションの動画を100本視聴
• プレゼンテーションの手法やスライドの特徴
などの調査結果を報告
概要
Abstract 2
• 上手いプレゼンテーションとは？
• プレゼンテーション手法に関する調査
• cvpaper.challenge
• 調査方法
• 調査結果
• 各セクションに関する分析
• デモがあるプレゼンテーション数
• 各セクションがあるプレゼンテーション数
• 各セクションの順序
• 調査で得られた知見
• cvpresentation.challenge
• おまけ
• おすすめプレゼンテーション動画
• おすすめプレゼンテーション文献
• 付録
目次
Outline

• 学会発表では, 短時間で効果的に研究成果を伝える必要がある
• 研究成果に興味を持ってもらう
• 論文引用・さらなる研究の発展につながる
• どんなプレゼンテーションが「上手い」のか
• 話し方・身振り手振り
• 発表資料の構成とデザイン
• 時間配分
• 内容の取捨選択
上手いプレゼンテーションとは？
Motivation 3

• CVPRなどのトップ会議の網羅的サーベイを通した研究のトレンドの把握
• 研究活動・論文投稿による新しいトレンド創出
→ CV分野における「上手い」論文や研究に関する調査
cvpaper.challenge
Related Work 4

• CVPRなどのトップ会議の網羅的サーベイを通した研究のトレンドの把握
• 研究活動・論文投稿による新しいトレンド創出
→ CV分野における「上手い」論文や研究に関する調査
cvpaper.challenge
Related Work 5
CV分野における「上手い」プレゼンテーション手法に
関する調査は少ない

• 短時間で研究成果を効果的に伝える方法を探る
• トップ会議のプレゼンテーション動画を多数視聴
• 発表手法やスライド資料に関する傾向や工夫について調査
プレゼンテーション手法に関する調査
Purpose 6

対象: CVPR2019, 2020のオーラルセッション動画 (5分)
• CVPR2019: 現地開催
• CVPR2020: オンライン開催, プレゼンテーション動画を提出
動画を5カテゴリに分類し各年各カテゴリ10本ずつ視聴 (2年×5カテゴリ×10本=計100本)
• A: Architecture, Representation, Theory & Optimization
• B: Recognition, Understanding, Segmentation & Retrieval
• C: Language, Reasoning, Body & Applications
• D: Synthesis
• E: Video & 3D
スライドを8セクションに分類し, プレゼンテーション時間, スライド枚数, 話し方, 工夫点を記録
• Title, Overview, Backgrounds, Related works, Methods, Experiments (Results 含む),
Future works (Limitation 含む), Conclusion
調査方法
Method 7

各セクションに関する分析
Result 8
A: Architecture, Representation, Theory & Optimization
B: Recognition, Understanding, Segmentation & Retrieval
C: Language, Reasoning, Body & Applications
D: Synthesis
E: Video & 3D

Result 9
D: Synthesis
E: Video & 3D
全体的にMethods長め
提案手法のプレゼンが最重要
Experimentsが最多の場合あり
具体例をたくさん見せたい

Result 10
D: Synthesis
E: Video & 3D
カテゴリAのスライド総数が少ない
1枚あたりの説明が長いため
(特にMethods)

デモがあるプレゼンテーション数
Result 11
• Aではデモが最も少ない
• デモより理論や手法優先
• Dでは全プレゼンにデモあり
• 実例を見せた方がわかりやすい
D: Synthesis
E: Video & 3D

各セクションがあるプレゼンテーション数
Result 12
A B C D E
Title 20 20 20 20 20
Overview 2 5 5 4 9
Backgrounds 17 16 19 16 18
Related works 7 11 10 12 11
Methods 20 20 20 20 20
Experiments 20 20 20 20 19
Future works 3 0 1 1 0
Conclusion 7 10 7 10 10
• Background, Methods,
Experimentsはほぼ全ての
プレゼンに存在
• OverviewとFuture worksが
あるプレゼンは少なめ
D: Synthesis
E: Video & 3D
表1 各セクションがあるプレゼンテーション数

• 89/100本
• Title, Overview, Backgrounds, Related works, Methods, Experiments, Future works, Conclusionの順
• 10/100本
• OverviewよりもBackgroundsが先
• 1/100本
• Title, Backgrounds, Related works, Overview, Methods, Experimentsの順
各セクションの順序
Result 13

調査で得られた知見
Result 14
効果的な工夫
• 動画・アニメーション: どこに注目しているかわかる
• 概念図・フローチャート: 難解な内容をわかりやすく伝える
• デモ: 実例で提案手法とその効果を伝える
• 数式に注釈を入れる
• 色やフォントを工夫する
現地開催
• はきはき自信をもった話し方
と原稿の棒読みの差が顕著
オンライン開催
• 録音品質, 収録環境, 明瞭な発音が重要
• 発音に自信がないなら字幕を入れると親切
• セクションごとに録音して動画編集でつなぎ合わせる

• CVPRのオーラルセッション動画を多数視聴し, CV分野における
プレゼンテーション手法について調査
• プレゼンテーションの傾向を把握することは, より効果的に自身の研究成果を
伝えることに有効だろう
cvpresentation.challenge
Conclusion 15
cvpresentation.challengeを発足しよう！

おすすめプレゼンテーション動画
おまけ1 16
• アニメーションによるわかりやすい説明
Ozan Unal. “Scribble-Supervised LiDAR Semantic Segmentation”. (CVPR 2022)
https://www.youtube.com/watch?v=vlYmqml2svs
• パッチワーク・ソフトウェア実演によるデザイン法
Mackenzie Leake. “A Mathematical Foundation for Foundation Paper Pieceable Quilts”. (SIGGRAPH2021)
https://www.youtube.com/watch?v=g04VgzzRhlQ
• ジェスチャー豊富・面白いプレゼン
ウィル・スティーヴン. “頭良さそうにTED風プレゼンをする方法”. (TEDxNewYork)
https://www.youtube.com/watch?v=ToJD5r2SmwI
• 説明の順序と美しい伏線回収
田崎晴明. “どうして時間は過去から未来に流れていくのだろう？マクロな系における不可逆性”.
https://www.youtube.com/watch?v=m21X0s3VLOM
• 発表後の具体的な実験デモ
Leyao Liu. “INS Conv: Incremental Sparse convolution for Online 3D Segmentation”. (CVPR 2022)
https://www.youtube.com/watch?v=V4gVuNyKaPQ
• 画像による丁寧な説明
Juewen Peng. “BokehMe: When Neural Rendering Meets Classical Rendering”. (CVPR 2022)
https://www.youtube.com/watch?v=e-zr_wCxNc8

おすすめプレゼンテーション文献
おまけ2 17
• スライド・ポスターデザイン
高橋佑磨, 片山なつ. “伝わるデザイン｜研究発表のユニバーサルデザイン”. 伝わるデザイン. 2018-12.
https://tsutawarudesign.com
• スライドデザイン
梶原浩太郎. “初期研修医のための学会スライドのキホン”. slideshare. 2016-03.
https://www.slideshare.net/k-kajiwara/2016-60055283
• 英語スライド構成・質疑応答
孫一寧. “SAY YES! TO 英語プレゼン〜もっと印象に残るプレゼンをしましょう〜”. 大阪大学附属図書館. 2016-06.
https://www.library.osaka-u.ac.jp/doc/TA_20160624_EN.pdf
• 日本語スライド構成・トーク・場面別対策
お茶の水女子大学伊藤研究室. “研究発表を準備する”. Itoh Laboratory. 2016-06.
http://itolab.is.ocha.ac.jp/~itot/message/ItolabPresentation2016.pdf
• 日本語スライド構成・デザイン・トーク
高道慎之介. “研究発表のためのプレゼンテーション技術”. slideshare. 2015-06.
https://www.slideshare.net/ShinnosukeTakamichi/ss-48987441
• ポスターデザイン
小野英理. “センス不要！伝わる研究ポスター作成術”. K-CONNEX. 2016-07.
http://k-connex.kyoto-u.ac.jp/ja/wp-content/uploads/sites/2/2016/07/160711-posterseminar-pub.pdf

• オーラルセッション動画のカテゴリは, CVPRのセッション名に基づき定める
• 2019A: Architecture, Representation, Theory & Optimization
• Deep Learning・Scenes & Representation・Learning, Physics, Theory, & Datasets・Low-Level & Optimization
• 2019B: Recognition, Understanding, Segmentation & Retrieval
• Recognition・Segmentation & Grouping
• 2019C: Language, Reasoning, Body & Applications
• Motion & Biometrics・Language & Reasoning・Applications・Face & Body
• 2019D: Synthesis
• Synthesis
• 2019E: Video & 3D
• 3D Multiview・Action & Video・3D Single View & RGBD
• 除外したセッション
• Computational Photography & Graphics
オーラルセッション動画のカテゴリ
付録1 18

• オーラルセッション動画のカテゴリは, CVPRのセッション名に基づき定める
• 2020A: Architecture, Representation, Theory & Optimization
• Adversarial Learning・Efficient Training and Inference・Low-Level and Physics-Based Vision・Transfer/Low-
Shot/Semi/Unsupervised Learning・Representation Learning・Optimization and Learning Methods・Machine Learning
Architectures and Formulations
• 2020B: Recognition, Understanding, Segmentation & Retrieval
• Image Retrieval・Datasets and Evaluation・Scene Analysis and Understanding・Segmentation, Grouping and Shape・
Architecture, Representation, Theory & Optimization
• 2020C: Language, Reasoning, Body & Applications
• Medical, Biological and Cell Microscopy・Face, Gesture, and Body Pose・Motion and Tracking・Vision & Language・Vision for
Robotics and Autonomous Vehicles・Vision Applications and Systems・Vision & Other Modalities・Visual Reasoning and Logical
Representation
• 2020D: Synthesis
• Image and Video Synthesis
• 2020E: Video & 3D
• 3D From a Single Image and Shape-From-X・Action and Behavior・3D From Multiview and Sensors・Video Analysis and
Understanding
• 除外したセッション
• Computational Photography・Explainable AI・Fairness, Accountability, Transparency and Ethics in Vision
オーラルセッション動画のカテゴリ
付録1 19

• 2019A
• Finding Task-Relevant Features for Few-Shot Learning by Category Traversal
• Probabilistic Permutation Synchronization Using the Riemannian Structure of the Birkhoff Polytope
• Lifting Vectorial Variational Problems: A Natural Formulation Based on Geometric Measure Theory and Discrete Exterior Calculus
• ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation
• Scan2CAD: Learning CAD Model Alignment in RGB-D Scans
• SOSNet: Second Order Similarity Regularization for Local Descriptor Learning
• Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
• Learning Video Representations From Correspondence Proposals
• A Generative Adversarial Density Estimator
• Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence
• 2020A
• Search to Distill: Pearls Are Everywhere but Not the Eyes
• Circle Loss: A Unified Perspective of Pair Similarity Optimization
• Learning Combinatorial Solver for Graph Matching
• Revisiting Knowledge Distillation via Label Smoothing Regularization
• Benchmarking Adversarial Robustness on Image Classification
• HyperSTAR: Task-Aware Hyperparameters for Deep Networks
• ActBERT: Learning Global-Local Video-Text Representations
• Hyperbolic Image Embeddings
• Towards Verifying Robustness of Neural Networks Against A Family of Semantic Perturbations
• How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework
サーベイしたオーラルセッション動画の論文名一覧
付録2 20

• 2019B
• Joint Discriminative and Generative Learning for Person Re-Identification
• Gradient Matching Generative Networks for Zero-Shot Learning
• Semantic Correlation Promoted Shape-Variant Context for Segmentation
• C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection
• Domain Generalization by Solving Jigsaw Puzzles
• Enhancing Diversity of Defocus Blur Detectors via Cross-Ensemble Network
• Deep Metric Learning Beyond Binary Supervision
• Panoptic Feature Pyramid Networks
• Learning to Cluster Faces on an Affinity Graph
• Transferrable Prototypical Networks for Unsupervised Domain Adaptation
• 2020B
• Dynamic Graph Message Passing Networks
• Learning User Representations for Open Vocabulary Image Hashtag Prediction
• Momentum Contrast for Unsupervised Visual Representation Learning
• PointRend: Image Segmentation As Rendering
• Few-Shot Class-Incremental Learning
• ViBE: Dressing for Diverse Body Shapes
• Interactive Object Segmentation With Inside-Outside Guidance
• Detection in Crowded Scenes: One Proposal, Multiple Predictions
• Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
• Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition
付録2 21

• 2019C
• High-Quality Face Capture Using Anatomical Muscles
• 3D Hand Shape and Pose Estimation From a Single RGB Image
• Deeper and Wider Siamese Networks for Real-Time Visual Tracking
• GFrames: Gradient-Based Local Reference Frame for 3D Shape Matching
• CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark
• Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation
• Monocular Total Capture: Posing Face, Body, and Hands in the Wild
• Towards Social Artificial Intelligence: Nonverbal Social Signal Prediction in a Triadic Interaction
• Efficient Online Multi-Person 2D Pose Tracking With Recurrent Spatio-Temporal Affinity Fields
• ATOM: Accurate Tracking by Overlap Maximization
• 2020C
• REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
• Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
• SAPIEN: A SimulAted Part-Based Interactive ENvironment
• LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
• TA-Student VQA: Multi-Agents Training by Self-Questioning
• P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds
• Collaborative Motion Prediction via Neural Motion Message Passing
• Iterative Context-Aware Graph Inference for Visual Dialog
• Reciprocal Learning Networks for Human Trajectory Prediction
• Counterfactual Vision and Language Learning
付録2 22

• 2019D
• Semantics Disentangling for Text-To-Image Generation
• Progressive Pose Attention Transfer for Person Image Generation
• Homomorphic Latent Space Interpolation for Unpaired Image-To-Image Translation
• Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation
• DeepVoxels: Learning Persistent 3D Feature Embeddings
• Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping
• Animating Arbitrary Objects via Deep Motion Transfer
• Label-Noise Robust Generative Adversarial Networks
• DLOW: Domain Flow for Adaptation and Generalization
• CollaGAN: Collaborative GAN for Missing Image Data Imputation
• 2020D
• Attentive Normalization for Conditional Image Generation
• Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting
• SynSin: End-to-End View Synthesis From a Single Image
• Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning
• Blurry Video Frame Interpolation
• Disentangled Image Generation Through Structured Noise Injection
• Cross-Domain Correspondence Learning for Exemplar-Based Image Translation
• SketchyCOCO: Image Generation From Freehand Scene Sketches
• Single Image Reflection Removal With Physically-Based Training Images
• Semantic Pyramid for Image Generation
付録2 23

• 2019E
• Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes
• STEP: Spatio-Temporal Progressive Learning for Video Action Detection
• GA-Net: Guided Aggregation Net for End-To-End Stereo Matching
• Deep Reinforcement Learning of Volume-Guided Progressive View Inpainting for 3D Point Scene Completion From a Single Depth Image
• Revealing Scenes by Inverting Structure From Motion Reconstructions
• BAD SLAM: Bundle Adjusted Direct RGB-D SLAM
• Pushing the Boundaries of View Extrapolation With Multiplane Images
• What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment
• NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences
• Gaussian Temporal Awareness Networks for Action Localization
• 2020E
• Extreme Relative Pose Network Under Hybrid Representations
• X3D: Expanding Architectures for Efficient Video Recognition
• Why Having 10,000 Parameters in Your Camera Model Is Better Than Twelve
• Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction
• BSP-Net: Generating Compact Meshes via Binary Space Partitioning
• Single-Shot Monocular RGB-D Imaging Using Uneven Double Refraction
• RoutedFusion: Learning Real-Time Depth Map Fusion
• OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression
• Blur Aware Calibration of Multi-Focus Plenoptic Camera
• Information-Driven Direct RGB-D Odometry
付録2 24

CVPRプレゼン動画100本サーベイ

Recommended

Recommended

More Related Content

Similar to CVPRプレゼン動画100本サーベイ

Similar to CVPRプレゼン動画100本サーベイ (20)

CVPRプレゼン動画100本サーベイ