This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
第7回全日本コンピュータビジョン勉強会「CVPR2021読み会」(前編)の発表資料です
https://kantocv.connpass.com/event/216701/
You Only Look One-level Featureの解説と、YOLO系の雑談や、物体検出における関連する手法等を広く説明しています
This document summarizes recent developments in action recognition using deep learning techniques. It discusses early approaches using improved dense trajectories and two-stream convolutional neural networks. It then focuses on advances using 3D convolutional networks, enabled by large video datasets like Kinetics. State-of-the-art results are achieved using inflated 3D convolutional networks and temporal aggregation methods like temporal linear encoding. The document provides an overview of popular datasets and challenges and concludes with tips on training models at scale.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
第7回全日本コンピュータビジョン勉強会「CVPR2021読み会」(前編)の発表資料です
https://kantocv.connpass.com/event/216701/
You Only Look One-level Featureの解説と、YOLO系の雑談や、物体検出における関連する手法等を広く説明しています
This document summarizes recent developments in action recognition using deep learning techniques. It discusses early approaches using improved dense trajectories and two-stream convolutional neural networks. It then focuses on advances using 3D convolutional networks, enabled by large video datasets like Kinetics. State-of-the-art results are achieved using inflated 3D convolutional networks and temporal aggregation methods like temporal linear encoding. The document provides an overview of popular datasets and challenges and concludes with tips on training models at scale.
[2010]
Large-scale Image Classification: Fast Feature Extraction and SVM Training
[2011]
High-dimensional signature compression for large-scale image classification
Paper Introduction "RankCompete:Simultaneous ranking and clustering of info...Kotaro Yamazaki
Paper Introduction.
RankCompete:Simultaneous ranking and clustering of information networks
https://www.researchgate.net/publication/257352130_RankCompete_Simultaneous_ranking_and_clustering_of_information_networks
SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)SSII
This document provides an overview of deepfake generation and detection. It begins with an introduction to the author and their background and research interests. The rest of the document is outlined as follows: definitions of deepfakes, various deepfake generation techniques including face synthesis, manipulation, reenactment and swapping, and an overview of deepfake detection methods including commonly used datasets, image-based and video-based detection approaches.
51. 全自動の限界
“1. The inker’s main purpose is to translate the penciller’s graphite pencil lines
into reproducible, black, ink lines.
2. The inker must honor the penciller’s original intent while adjusting any obvious
mistakes.
3. The inker determines the look of the finished art.”
— Gary Martin, The Art of Comic Book Inking [1997]
24
114. モデル
32
64
128
256
512
1024
512
256
128
64
32
512
256
128
64
32
512
256
128
64 32
Line
drawing
X
( w ×h )
Split user
scribble
Ui
( w ×h ×4 )
Split user
scribble mask
Mi
( w ×h )
Flat
colour map
Y
( w ×h ×3 )
Flat merging
weight
Wi
*
( w ×h )
Region
skeleton map
S
( w ×h )
Convolution layer ReLU
Average pooling layer
Up-sampling layer
line drawing
X
split scribble
Ui
scribble mask
Mi
Y
Ci
Wi
*
Wi
S
Si
MSE
MSE
MSE
Weight
decoder
Region
decoder
Layer
decoder
Colourlayer
Mergingweight
Regionskeleton
60
118. モチベーショ ン
Stroke Density Lighting Effects
35:24 200:01
Measured Estimated Ours Artist (Conditioned)
Original Image Stroke History
44:13 241:37
Artist (Unconditioned)
29:11 187:54
63
119. 概要
Artists’ workflow
(b) Measured
stroke density
(c) Artist’s coarse
effect layer
(g) Effect created
with another style
(d) Artist’s refined
effect layer
(e) Visualization of
painted patches
(f) Artist’s final
lighting effect
Low Density Patches
High Density Patches
33:24 241:59
(a) Artist’s real
stroke history
(h) Original image
(R)
(i) Extracted palette
(M)
(j) Estimated stroke
density (K)
(k) Normalized
channel intensity (N)
(l) coarse
lighting effect (E)
(m) Refined lighting
effect (S)
(n) Output
(I)
Proposed algorithm
64