SlideShare a Scribd company logo
1 of 80
Download to read offline
動画像理解のための
深層学習アプローチ
Deep learning approaches to video understanding
玉木徹(名工大)
2023/12/14
動作認識とは
n動画像理解タスクの一つ
n人間の動作を認識する
(Action Recognition, AR)
• 動き:motion
• 動作:action, activity
• 行動:behavior
n動画像の識別
• 「人間が動作している映像」に
限らない
• 人間に限定する手法もある
• 人物検出や姿勢推定を併用
モデル
カテゴリ
画像
モデル
カテゴリ
画像認識
動作認識
cats and dogs dataset
2xxcpLQHZf8_000002_000012.mp4
Kinetics400 [Kay+, arXiv2017]
動画像=静止フレームの時系列
n入力:動画像
• 時間方向の次元が増える
• 時間情報や動き情報のモデ
ル化が必要
• temporal modeling
時間
動画像
フレーム
2xxcpLQHZf8_000002_000012.mp4
Kinetics400 [Kay+, arXiv2017]
データセット
dataset
Action recognition datasets
KTH
Weizmann
UCF11
UCF50
UCF101
UCF101-24
(THUMOS13)
THUMOS14
THUMOS15
2005 2010 2012 2015 2017 2019 2020
Kinetics
400
Kinetics
600
Kinetics
700
Kinetics
700
-2020
ActivityNet
v1.3
MiT
Multi
THUMOS
Multi-MiT
HVU
AVA
Actions
AVA-Kinetics
HACS
Segment
SSv1
HMDB51
JHMDB21
Charades
Action Genome
Home
Action
Genome
Visual Genome
Transformer
Ego-4D
ActivityNet
v1.2
SSv2
Charades-Ego
Jester HAA500
FineGym
Hollywood2
MPII Composites MPII Cooking 2 Diving48
2021 2022
2016 2018
2013 2014
2011
2006 2007 2008 2009
2004
Action label(s) per video/clip
Temporal annotation
Spatio-Temporal annotation
HACS
Clip
MPII Cooking
YouYube-8M
HowTo
100M
ImageNet AlexNet ResNet
U-Net
EPIC-
KITCHENS55
EPIC-
KITCHENS100
SURF
FAST
ORB GAN ViT
HOG
YouYube-8M
Segments
ver2016 ver2017 ver2018
Hollywood
Olympic
Sports
IXMAS
DALY
Sports-1M
UCF
Sports
machine generated labels
multi labels
untrimmed YouTube videos
worker/lab videos
YouCook2
YouCook
Hollywood
-Extended
Breakfast
50Salads
Coffee
Cigarettes
ActivityNet
Entities
Kinetics
100
Mini-Kinetics
200
HMDB51
nHuman Motion DataBase
• ソース:Digitized movies, Prelinger
archive, YouTube and Google videos,
etc
• 既存のUCF-SportsやOlympicSportsは
ソースがYoutubeのみ,アクションが
曖昧,人の姿勢で分かってしまう
n 51カテゴリ,6766動画
• 各カテゴリ最低101
• 1〜5秒程度,平均3.15秒程度
• min 0.63秒,max 35.43秒
[Jhuang+, ICCV2011]
Karlsruhe, Germany
kuehne@kit.edu
Cambridge, MA 02139
hueihan@mit.edu, tp@ai.mit.edu
Providence, RI 02906
thomas serre@brown.edu
Abstract
With nearly one billion online videos viewed everyday,
an emerging new frontier in computer vision research is
recognition and search in video. While much effort has
been devoted to the collection and annotation of large scal-
able static image datasets containing thousands of image
categories, human action datasets lag far behind. Cur-
rent action recognition databases contain on the order of
ten different action categories collected under fairly con-
trolled conditions. State-of-the-art performance on these
datasets is now near ceiling and thus there is a need for the
design and creation of new benchmarks. To address this is-
sue we collected the largest action video database to-date
with 51 action categories, which in total contain around
7,000 manually annotated clips extracted from a variety of
sources ranging from digitized movies to YouTube. We use
this database to evaluate the performance of two represen-
tative computer vision systems for action recognition and
explore the robustness of these methods under various con-
ditions such as camera motion, viewpoint, video quality and
occlusion.
Figure 1. Sample frames from the proposed HMDB51 [1] (from
top left to lower right, actions are: hand-waving, drinking, sword
UCF101
nUniversity of Central Florida
• ソースはYouTube
• 手動でクリーニング
• UCF-Sports/11/50の後継
n101カテゴリ,13,320動画
• 最短1.06秒,最長71.04秒,平
均7.21秒
[Soomro+, arXiv, 2012]
Kinetics
nthe DeepMind Kinetics human action
video dataset
• ソースはYoutube
• K400: train 22k, val 18k, test 35k
• 動画は最長10秒
• top1とtop5で評価
n種類
• Kinetics-400 [Kay+, arXiv2017]
• Kinetics-600 [Carreira+, arXiv2018]
• Kinetics-700 [Carreira+, arXiv2019]
• Kinetics-700-2020 [Smaira+, arXiv2020]
nポリシー:1動画から1クリップ
• HMDBやUCFは1動画から複数クリップ
[Kay+, arXiv2017]
(a) headbanging
(c) shaking hands
(e) robot dancing
SSv2
nsomething-something v2
• アクションラベルではなく,名詞・
動詞のパターンを理解するべき
• 174のテンプレート文174=ラベル
• "Dropping [something] into
[something]"
• "Stacking [number of] [something]"
• 「something」
• アクション対象の物体名が入る
プレースホルダー
• 221k動画
• train 167k, val 25k, test 27k
• 平均4.03秒 (v1)
v1 [Goyal+, ICCV2017] v2 [Mahdisoltani+, arXiv2018]
Putting a white remote into a cardboard box
Pretending to put candy onto chair
Pushing a green chilli so that it falls off the table
Moving puncher closer to scissor
Figure 4: Example videos and corresponding descriptions. Object entries shown in italics.
動作認識モデル
Action recognition models
Action Recognition models
2015 2017 2019 2020 2021 2022
2016 2018
2013 2014
restricted 3D
Full 3D
DT
IDT
Two
Stream
TSN
C3D I3D
P3D
S3D
R(2+1)D
3D ResNet
Non-Local
TSM
SlowFast X3D
ViVit
TimeSformer
STAM
Video Transformer Network
VidTr
X-ViT
2D + 1D aggregation
(2+1)D
(2+1)D
CNN
Non-Deep
Vision
Transformer
2D + 1D aggregation
R3D
Transformer
Kinetics
ResNet
U-Net
GAN ViT
2012
ImageNet
TokenShift
VideoSwin
Fusion:フレーム毎に2D CNNを適用
n2D CNNを動画像に適用する方法
• single:1フレームにだけ使用
• late fusion:各フレームに2D CNNを適用,
最後に時間方向に集約
• early fusion:複数フレームを一度に入力
• TSN
• slow fusion:ネットワークの途中で徐々に統
合していく(lateral fusion)
n考え方,用語は受け継がれている
• late fusionは健在
• フレーム毎に適用,単純平均・1D CNN・
Transformerなどで集約
• ベースラインとして利用
[Karpathy+, CVPR2014]
Figure 1: Explored approaches for fusing information over
temporal dimension through the network. Red, green and
blue boxes indicate convolutional, normalization and pool-
ing layers respectively. In the Slow Fusion model, the de-
picted columns share parameters.
3.1. Time Information Fusion in CNNs
in t
fram
con
by c
S
mix
info
ers
bot
by
in t
to s
[1,
exte
an
Two-Stream network
nRGBとオプティカルフローを利用
• それぞれが2D CNN
• RGB:フレーム1枚
• フロー:複数フレームのスタック
• フローをつなげたtrajectoryも試した
が性能悪い
nフローの利用
• これ以降,RGBフレームを用いる
CNNモデルは「最後に,フローを追
加すると性能ブースト」を示すことに
なる
n大量の派生手法あり
[Simonyan&Zisserman, NIPS2014]
conv1
7x7x96
stride 2
norm.
pool 2x2
conv2
5x5x256
stride 2
norm.
pool 2x2
conv3
3x3x512
stride 1
conv4
3x3x512
stride 1
conv5
3x3x512
stride 1
pool 2x2
full6
4096
dropout
full7
2048
dropout
softmax
conv1
7x7x96
stride 2
norm.
pool 2x2
conv2
5x5x256
stride 2
pool 2x2
conv3
3x3x512
stride 1
conv4
3x3x512
stride 1
conv5
3x3x512
stride 1
pool 2x2
full6
4096
dropout
full7
2048
dropout
softmax
Spatial stream ConvNet
Temporal stream ConvNet
single frame
input
video multi-frame
optical flow
class
score
fusion
TSN
nTemporal Segment Networks
• 単純な工夫
• クリップを分割
• それぞれでtwo-stream
• 最後に統合
• 長期の時間的モデリング
• フローは短期の時間的モデリング
[Wang+, ECCV2016]
TSNs: Towards Good Practices for Deep Action Recognition 25
(2+1)D CNN
n3D Convの計算量削減
• 空間方向2D convと時間方向の1D
convを組み合わせる
nSeparable convの代表例
• P3D [Qiu+, ICCV2017]
• S3D [Xie+, ECCV2018]
• R(2+1)D [Tran+, CVPR2018]
• R3Dも比較(3D ResNetのこと)
nモジュールの中で2D/1D conv
• P3D, S3D
n層ごとに2Dと3Dを分ける
• S3D,MCx/rMCx
t x d x d
1 x d x d
t x 1 x 1
Mi
a) b)
Figure 2. (2+1)D vs 3D convolution. The illustration is given for
the simplified setting where the input consists of a spatiotemporal
volume with a single feature channel. (a) Full 3D convolution is
carried out using a filter of size t × d × d where t denotes the tem-
poral extent and d is the spatial width and height. (b) A (2+1)D
convolutional block splits the computation into a spatial 2D con-
volution followed by a temporal 1D convolution. We choose the
numbers of 2D filters (Mi) so that the number of parameters in our
(2+1)D block matches that of the full 3D convolutional block.
using 2D convolutions in the top layers. Since in this work
we consider 3D ResNets (R3D) having 5 groups of convo-
lutions (see Table 1), our first variant consists in replacing
all 3D convolutions in group 5 with 2D convolutions. We
denote this variant with MC5 (Mixed Convolutions). We
design a second variant that uses 2D convolutions in group
4 and 5, and name this model MC4 (meaning from group 4
and deeper layers all convolutions are 2D). Following this
pattern, we also create MC3 and MC2 variations. We omit
to consider MC1 since it is equivalent to a 2D ResNet (f-
0 10 20 30 40 50
epoch
0
0.2
0.4
0.6
0.8
1
error
(%)
R3D-18 train
R3D-18 val
R(2+1)D-18 train
R(2+1)D-18 val
0 10 20 30 40 50
epoch
0
0.2
0.4
0.6
0.8
1
error
(%)
R3D-34 train
R3D-34 val
R(2+1)D-34 train
R(2+1)D-34 val
Figure 3. Training and testing errors for R(2+1)D and R3D.
Results are reported for ResNets of 18 layers (left) and 34 layers
(right). It can be observed that the training error (thin lines) is
smaller for R(2+1)D compared to R3D, particularly for the net-
work with larger depth (right). This suggests that the the spatial-
temporal decomposition implemented by R(2+1)D eases the opti-
mization, especially as depth is increased.
the temporal convolutions. We choose Mi = ⌊ td2
Ni−1Ni
d2Ni−1+tNi
⌋
so that the number of parameters in the (2+1)D block is
approximately equal to that implementing full 3D convolu-
tion. We note that this spatiotemporal decomposition can
be applied to any 3D convolutional layer. An illustration
of this decomposition is given in Figure 2 for the simplified
setting where the input tensor zi−1 contains a single channel
(i.e., Ni−1 = 1). If the 3D convolution has spatial or tem-
poral striding (implementing downsampling), the striding is
correspondingly decomposed into its spatial or temporal di-
mensions. This architecture is illustrated in Figure 1(e).
Compared to full 3D convolution, our (2+1)D decom-
position offers two advantages. First, despite not changing
R(2+1)D [Tran+, CVPR2018]
2D Inc.
Conv
1x3x3
Conv
1x1x1
1x3x3
Max-Pool
Conv
1x1x1
Previous Layer
Conv
1x1x1
Conv
1x1x1
Next Layer
Concat
Conv
1x3x3
(a)
3D Inc.
Conv
3x3x3
Conv
1x1x1
3x3x3
Max-Pool
Conv
1x1x1
Previous Layer
Conv
1x1x1
Conv
1x1x1
Next Layer
Concat
Conv
3x3x3
(b)
Sep-Inc.
Conv
1x3x3
Conv
3x1x1
Conv
3x1x1
Conv
1x1x1
3x3x3
Max-Pool
Conv
1x1x1
Previous Layer
Conv
1x1x1
Conv
1x1x1
Next Layer
Concat
Conv
1x3x3
(c)
Fig. 3. (a) 2D Inception block; (b) 3D Inception block; (c) 3D temporal separable Inception block
used in S3D networks.
S3D [Xie+, ECCV2018]
7,7,7
Conv
Stride 2
1x3x3
Max-Pool
Stride
1,2,2
1x1x1
Conv
1x1x1
Conv
3x3x3
Conv
1x3x3
Max-Pool
Stride
1,2,2
3D
Inc.
3D
Inc.
3x3x3
Max-Pool
Stride 2
3D
Inc.
3D
Inc.
3D
Inc.
3D
Inc.
3D
Inc.
3D
Inc.
3D
Inc.
2x2x2
Max-Pool
Stride 2
2x7x7
Avg-Pool
Video
(64 Frames)
Prediction
(400D)
(a) I3D
1,7,7
Conv
Stride 2
1x3x3
Max-Pool
Stride
1,2,2
1x1x1
Conv
1x1x1
Conv
1x3x3
Conv
1x3x3
Max-Pool
Stride
1,2,2
2D
Inc.
2D
Inc.
3x3x3
Max-Pool
Stride
2,2,2
2D
Inc.
2D
Inc.
2D
Inc.
2D
Inc.
2D
Inc.
2D
Inc.
2D
Inc.
2x2x2
Max-Pool
Stride
2,2,2
1x7x7
Avg-Pool
Video
Prediction
(b) I2D
7,7,7
Conv
Stride 2
1x3x3
Max-Pool
Stride
1,2,2
1x1x1
Conv
1x1x1
Conv
3x3x3
Conv
1x3x3
Max-Pool
Stride
1,2,2
3D
Inc.
3D
Inc.
3x3x3
Max-Pool
Stride
2,2,2
3D
Inc.
3D
Inc.
2D
Inc.
2D
Inc.
2D
Inc.
2D
Inc.
2D
Inc.
2x2x2
Max-Pool
Stride
2,2,2
2x7x7
Avg-Pool
Video
Prediction
K=0 K=1
K=2
K=3
K=4
K=5
K=6
K=7
K=8 K=9 K=10
(c) Bottom-heavy I3D
1,7,7
Conv
Stride 2
1x3x3
Max-Pool
Stride
1,2,2
1x1x1
Conv
1x1x1
Conv
1x3x3
Conv
1x3x3
Max-Pool
Stride
1,2,2
2D
Inc.
2D
Inc.
3x3x3
Max-Pool
Stride
2,2,2
2D
Inc.
2D
Inc.
3D
Inc.
3D
Inc.
3D
Inc.
3D
Inc.
3D
Inc.
2x2x2
Max-Pool
Stride
2,2,2
2x7x7
Avg-Pool Prediction
Video
K=0 K=1
K=2
K=3
K=4
K=5
K=6
K=7
K=8 K=9 K=10
(d) Top-heavy I3D
Fig. 2. Network architecture details for (a) I3D, (b) I2D, (c) Bottom-Heavy and (d) Top-Heav
variants. K indexes the spatio-temporal convolutional layers. The “2D Inc.” and “3D Inc.” block
refer to 2D and 3D inception blocks, defined in Figure 3.
2D Inc.
Conv
1x3x3
Conv
1x1x1
1x3x3
Max-Pool
Conv
1x1x1
Previous Layer
Conv
1x1x1
Conv
1x1x1
Next Layer
Concat
Conv
1x3x3
(a)
3D Inc.
Conv
3x3x3
Conv
1x1x1
3x3x3
Max-Pool
Conv
1x1x1
Previous Layer
Conv
1x1x1
Conv
1x1x1
Next Layer
Concat
Conv
3x3x3
(b)
Sep-Inc.
Conv
1x3x3
Conv
3x1x1
Conv
3x1x1
Conv
1x1x1
3x3x3
Max-Pool
Conv
1x1x1
Previous Layer
Conv
1x1x1
Conv
1x1x1
Next Layer
Concat
Conv
1x3x3
(c)
Fig. 3. (a) 2D Inception block; (b) 3D Inception block; (c) 3D temporal separable Inception bloc
used in S3D networks.
2D conv
2D conv
2D conv
2D conv
2D conv
fc
(a) R2D
3D conv
3D conv
3D conv
3D conv
3D conv
fc
(d) R3D
(2+1)D conv
fc
(2+1)D conv
(2+1)D conv
(2+1)D conv
(2+1)D conv
(e) R(2+1)D
2D conv
2D conv
2D conv
3D conv
3D conv
fc
(b) MC
3D conv
3D conv
3D conv
2D conv
2D conv
fc
(c) rMC
x x
space-time pool space-time pool space-time pool space-time pool space-time pool
clip clip clip clip clip
Figure 1. Residual network architectures for video classification considered in this work. (a) R2D are 2D ResNets; (b) MCx are
ResNets with mixed convolutions (MC3 is presented in this figure); (c) rMCx use reversed mixed convolutions (rMC3 is shown here); (d)
R3D are 3D ResNets; and (e) R(2+1)D are ResNets with (2+1)D convolutions. For interpretability, residual connections are omitted.
3. Convolutional residual blocks for video
In this section we discuss several spatiotemporal convo-
lutional variants within the framework of residual learning.
Let x denote the input clip of size 3×L×H ×W, where L
is the number of frames in the clip, H and W are the frame
height and width, and 3 refers to the RGB channels. Let
zi be the tensor computed by the i-th convolutional block
in the residual network. In this work we consider only
“vanilla” residual blocks (i.e., without bottlenecks) [13],
with each block consisting of two convolutional layers with
a ReLU activation function after each layer. Then the output
of the i-th residual block is given by
zi = zi−1 + F(zi−1; θi) (1)
where F(; θi) implements the composition of two convo-
lutions parameterized by weights θi and the application of
the ReLU functions. In this work we consider networks
dimensions of the preceding tensor zi−1. Each filter yields
a single-channel output. Thus, the very first convolutional
layer in R2D collapses the entire temporal information of
the video in single-channel feature maps, which prevent any
temporal reasoning to happen in subsequent layers. This
type of CNN architecture is illustrated in Figure 1(a). Note
that since the feature maps have no temporal meaning, we
do not perform temporal striding for this network.
3.2. f-R2D: 2D convolutions over frames
Another 2D CNN approach involves processing indepen-
dently the L frames via a series of 2D convolutional resid-
ual block. The same filters are applied to all L frames. In
this case, no temporal modeling is performed in the convo-
lutional layers and the global spatiotemporal pooling layer
at the top simply fuses the information extracted indepen-
dently from the L frames. We refer to this architecture vari-
ant as f-R2D (frame-based R2D).
P3D [Qiu+, ICCV2017]
(a) Residual Unit [7]
+
1x1x1 conv
1x1x1 conv
1x3x3 conv
ReLU
ReLU
3x1x1 conv
ReLU
ReLU
(b) P3D-A
+
1x1x1 conv
1x1x1 conv
ReLU
ReLU
1x3x3 conv 3x1x1 conv
+
ReLU
ReLU
(c) P3D-B
+
1x1x1 conv
1x1x1 conv
1x3x3 conv
ReLU
ReLU
3x1x1 conv
ReLU
+
ReLU
(d) P3D-C
Figure 3. Bottleneck building blocks of Residual Unit and our Pseudo-3D.
shortcut connection from S to the final output, making the
output x as
Table 1. Comparisons of ResNet-50 and different Pseudo-3D
ResNet variants in terms of model size, speed, and accuracy on
Table 2. Comparisons in terms of pre-train data, clip length, Top-1 clip
Method Pre-train Data Clip L
Deep Video (Single Frame) [10] ImageNet1K
Deep Video (Slow Fusion) [10] ImageNet1K 1
Convolutional Pooling [37] ImageNet1K 1
C3D [31] – 1
C3D [31] I380K 1
ResNet-152 [7] ImageNet1K
P3D ResNet (ours) ImageNet1K 1
P3D-A P3D-B P3D-C P3D-A P3D-B P3D-C
...
3D ResNet
nResNetの3D版
• 大規模なKineticsで学習すれば層の多い3D ResNetで性能が出ることを示した
n補足
• 論文名から内容とモデル名が分かりにくい
• そのためあまり引用されてない...?
[Hara+, CVPR2018]
X3D
n最適なアーキテクチャを探す
• NAS (Network Architecture Search)によ
る探索
• SlowFastのFast(ResNet)をベース
• 複数のパラメータを変更
• 空間・時間解像度,チャネル数など
• greedyに探索
• 規模の異なる複数のX3Dを提案
• かなり軽量
[Feichtenhofer, CVPR2020]
τ
s
d
b
t
w
X3D
d
d
t
t
b
τ
τ
X3D-L
X3D-M
X3D-S
X3D-XS
s
s
s
X2D
w
X3D-XL
Model capacity in GFLOPs (# of multiply-adds x 109
)
0 5 15 25 35
10 20 30
80
75
70
65
60
55
50
Kinetics
top-1
accuracy
(%)
Figure 2. Progressive network expansion of X3D. The X2D base
st nd
model to
X3D-XS 6
X3D-S 7
X3D-M 7
X3D-L 7
X3D-XL 7
Table 2. Expand
used. We show t
as computationa
operations, in #
Inference-time c
as a fixed numb
4.1. Expande
The accura
sion process o
from X2D tha
axis) with 1.63
zontal axis), w
step. We use 1
model top-1 top-5
regime FLOPs Params
FLOPs (G) (G) (M)
X3D-XS 68.6 87.9 X-Small ≤ 0.6 0.60 3.76
X3D-S 72.9 90.5 Small ≤ 2 1.96 3.76
X3D-M 74.6 91.7 Medium ≤ 5 4.73 3.76
X3D-L 76.8 92.5 Large ≤ 20 18.37 6.08
X3D-XL 78.4 93.6 X-Large ≤ 40 35.84 11.0
Table 2. Expanded instances on K400-val. 10-Center clip testing is
フレーム毎にVision Transformer+late fusion
n(2+1)D Transformer
• STAM [Sharir+, arXiv2021]
• Video Transformer Network
[Neimark+, ICCVW2021]
nVidTr [Zhang+, ICCV2021]
• Transformerブロック内で(2+1)Dアテン
ション
nX-ViT [Bulat+, NeurIPS2021]
• アテンションを前後フレームに制限
(a) Full space-time atten-
tion: O(T2
S2
)
(b) Spatial-only attention:
O(TS2
)
(c) TimeSformer [3] and
ViViT (Model 3) [1]:
O(T2
S + TS2
)
(d) Ours: O(TS2
)
X-ViT [Bulat+, NeurIPS2021]
Video Transformer Network
Daniel Neimark Omri Bar Maya Zohar Dotan Asselmann
Theator
{danieln, omri, maya, dotan}@theator.io
Abstract
This paper presents VTN, a transformer-based frame-
work for video recognition. Inspired by recent developments
in vision transformers, we ditch the standard approach in
video action recognition that relies on 3D ConvNets and
introduce a method that classifies actions by attending
to the entire video sequence information. Our approach
is generic and builds on top of any given 2D spatial
network. In terms of wall runtime, it trains 16.1⇥ faster
and runs 5.1⇥ faster during inference while maintaining
competitive accuracy compared to other state-of-the-art
methods. It enables whole video analysis, via a single
end-to-end pass, while requiring 1.5⇥ fewer GFLOPs. We
report competitive results on Kinetics-400 and Moments
in Time benchmarks and present an ablation study of
VTN properties and the trade-off between accuracy and
inference speed. We hope our approach will serve as a
new baseline and start a fresh line of research in the video
recognition domain. Code and models are available at:
https://github.com/bomri/SlowFast/blob/
master/projects/vtn/README.md.
1. Introduction
Attention matters. For almost a decade, ConvNets have
ruled the computer vision field [22, 7]. Applying deep
ConvNets produced state-of-the-art results in many visual
recognition tasks, i.e., image classification [32, 19, 34], ob-
ject detection [17, 16, 28], semantic segmentation [25], ob-
ject instance segmentation [18], face recognition [33, 30]
and video action recognition [9, 38, 3, 39, 23, 14, 13,
12]. But, recently this domination is starting to crack as
transformer-based models are showing promising results in
many of these tasks [10, 2, 35, 40, 42, 15].
Video recognition tasks also rely heavily on ConvNets.
In order to handle the temporal dimension, the fundamen-
tal approach is to use 3D ConvNets [5, 3, 4]. In contrast to
other studies that add the temporal dimension straight from
the input clip level, we aim to move apart from 3D net-
works. We use state-of-the-art 2D architectures to learn the
spatial feature representations and add the temporal infor-
Figure 1. Video Transformer Network architecture. Connecting
three modules: A 2D spatial backbone (f(x)), used for feature ex-
traction. Followed by a temporal attention-based encoder (Long-
former in this work), that uses the feature vectors ( i) combined
with a position encoding. The [CLS] token is processed by a clas-
sification MLP head to get the final class prediction.
mation later in the data flow by using attention mechanisms
on top of the resulting features. Our approach input only
RGB video frames and without any bells and whistles (e.g.,
optical flow, streams lateral connections, multi-scale infer-
ence, multi-view inference, longer clips fine-tuning, etc.)
achieves comparable results to other state-of-the-art mod-
els.
Video recognition is a perfect candidate for Transform-
ers. Similar to language modeling, in which the input words
or characters are represented as a sequence of tokens [37],
videos are represented as a sequence of images (frames).
However, this similarity is also a limitation when it comes
to processing long sequences. Like long documents, long
videos are hard to process. Even a 10 seconds video, such
as those in the Kinetics-400 benchmark [21], are processed
in recent studies as short, ˜2 seconds, clips.
But how does this clip-based inference would work on
much longer videos (i.e., movie films, sports events, or sur-
gical procedures)? It seems counterintuitive that the infor-
3163
Video Transformer
Network [Neimark+,
CVPRW2021]
STAM [Sharir+, arXiv2021]
Figure 1: Spatio-temporal separable-attention video trans-
former (VidTr). The model takes pixels patches as input and
3.2. VidTr
In Table 2
pable of learn
cal patches. H
tention matrix
stored in mem
ory consumpt
length. We ca
creases memo
to O(T2
W2
H
ing, which ma
vices. We no
attention archi
3.2.1 Separa
To address the
VidTr [Zhang+, ICCV2021]
Figure 3. Our proposed Transformer Network for video
Our goal is to provide a model that can utilize sparsely
subsampled temporal data for accurate predictions. Such
model need to be able to capture long-term dependencies
as well. While 2D convolutions filters are tailor-made for
the structure of images, utilizing local connections and pro-
viding desired properties for object recognition and detec-
tion [16], the same properties might negatively affect the
processing of subsampled temporal data. While a series of
3D-convolutions can learn long-term interactions due to in-
creased receptive field, they are biased towards local ones.
In order to verify this, we conducted an experiment: we fed
leading methods that are based on 3D convolutions with the
same subsampled data as in our method. The results are
presented in Table 5. The performance of both methods de-
graded significantly, the error of X3D increased by 23% and
SlowFast error by 50%.
Transformers offer advantages over their convolutional
counterparts regarding modeling long-term dependencies.
While a multi-head self-attention layer with sufficient num-
attention applied to the sequence of fr
tors. This separation between the spatia
tion components has several advantage
the computation by breaking down the
two shorter sequences. In the first stag
pared to N other patches within a fram
compares each frame embedding vecto
resulting in less overall computation t
patch to NF other patches.
The second advantage stems from th
temporal information is better exploite
abstract) level of the network. In ma
2D and 3D convolutions are used in th
3D components are only used on the t
same reasoning, we apply the tempora
embeddings rather than on individual
level representations provide more sens
in a video compared to individual patc
Input embeddings. The input to
transformer is X 2 RH⇥W ⇥3⇥F
consi
of size H ⇥W sampled from the origin
in this input block is first divided i
patches. For a frame of size H⇥W, we
patches of size P ⇥ P.
These patches are flattened into vec
jected into an embedding vector:
z
(0)
(p,t) = Ex(p,t) + ep
(
where input vector x(p,t) 2 R3P 2
, an
z(p,t) 2 RD
are related by a learnable p
vector epos
(p,t), and matrix E. The ind
patch and frame index, respecitvely wi
t = 1, . . . , F. In order to use the Tr
classification, a learnable classificatio
the first position in the embedding se
As will be shown, this classification t
encode the information from each fra
temporally across the sequence of fram
we include a separate classification tok
the sequence z
(0)
(0,t).
ViViT
nModel 1:3D ViT
• ViTへのパッチを3Dにする
nModel 2:late fusion,(2+1)D
• フレーム毎に2D Transformer
• 時間方向に1D Transformer
!
"
#
…
1
C
L
S
N
Positional
+
Token
Embedding
Temporal
+
Token
Embedding
Embed to tokens
…
1 N
2
…
1 N
…
T
Temporal Transformer Encoder
MLP
Head Class
…
C
L
S
1
0
0
C
L
S
0
C
L
S
0
Spatial Transformer
Encoder
Spatial Transformer
Encoder
Spatial Transformer
Encoder
[Arnab+, ICCV2021]
TimeSformer
nアテンションを制限
• Divided attention
n(2+1)D
• 空間:同じフレーム内の
パッチ
• 時間:異なるフレーム同
じ位置のパッチ
[Bertasius+, ICML2021]
Is Space-Time Attention All You Need for Video Understanding?
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Time Att.
Space Att.
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
MLP
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Space Att.
MLP
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Joint Space-Time Att.
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Space Attention (S)
Joint Space-Time
Attention (ST)
Divided Space-Time
Attention (T+S)
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Time Att.
Width Att.
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
MLP
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Height Att.
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Axial Attention
(T+W+H)
Sparse Local Global
Attention (L+G)
MLP
Local Att.
Global Att.
MLP
z(`)
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
z(` 1)
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
z(`)
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
z(`)
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
z(`)
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
z(`)
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
<latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit>
z(` 1)
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
z(` 1)
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
z(` 1)
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
z(` 1)
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
<latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit>
Figure 1. The video self-attention blocks that we investigate in this work. Each attention layer implements self-attention (Vaswani et al.,
2017b) on a specified spatiotemporal neighborhood of frame-level patches (see Figure 2 for a visualization of the neighborhoods). We use
residual connections to aggregate information from different attention layers within each block. A 1-hidden-layer MLP is applied at the
end of each block. The final model is constructed by repeatedly stacking these blocks on top of each other.
the N patches span the entire frame, i.e., N = HW/P2
.
Object-ABN
動作認識のための
シャープなアテンショ ンマップ生成
仁田智也 (名古屋工業大学)
平川翼 (中部大学)
藤吉弘亘 (中部大学)
玉木徹 (名古屋工業大学)
SSII2022/IEICE-ED2023
研究動機
n動画認識
• 動画の中の行動や動作を識別する技術
• 画像と時間情報を同時に取り扱う
nアテンション機構
• 入力データの注目するべき所を決める
• Transformer (Vaswani+, NIPS2017)
• 動画認識にも利用されている
• GTA (He+, BMVC2021)
• Non-Local Neural Network (Wang+, CVPR2018)
• アテンションマップが説明可能AIなどに使われる
• ABN (Fukui+, CVPR2019)
飛び込み
している
リフティングして
いる
入力動画 アテンションマップ
Object-ABN
Extractor
classifie
r
Attention
branch
Instance
segmentation
OR
各オブジェクトのmask
Attention map
全オブジェクトのmask
<latexit sha1_base64="9dzSeOm/bdyVmyql0cR3dYUx3BY=">AAAChXichVFNLwNBGH66Pkp9FReJy0ZDetFMpUFcSFycREtVQjW7azCxX9mdNqFxdfAHHJxIRIQrf8DFH3DwE8SRxMXB2+02QoN3MjPPPPM+7zwzo7um8CVjTxGlpbWtPdrRGevq7unti/cPrPpO2TN43nBMx1vTNZ+bwuZ5KaTJ11yPa5Zu8oK+N1/bL1S45wvHXpH7Li9a2o4ttoWhSaJKcTW3WV1UN6SwuK+uNMBCAxQOS/EES7Eg1GaQDkECYSw58UtsYAsODJRhgcOGJGxCg09tHWkwuMQVUSXOIySCfY5DxEhbpixOGRqxezTu0Go9ZG1a12r6gdqgU0zqHilVjLJHdsVe2QO7Zs/s49da1aBGzcs+zXpdy91S3/HQ8vu/Kotmid0v1Z+eJbYxHXgV5N0NmNotjLq+cnDyujyTG62OsXP2Qv7P2BO7pxvYlTfjIstzp4jRB6R/PnczWJ1IpSdTE9lMYi4ZfkUHhjGCJL33FOawgCXk6dwj3OAWd0pUGVcyymQ9VYmEmkF8C2X2E+AxlPg=</latexit>
RN⇥T ⇥H⇥W
<latexit sha1_base64="+GfDlrSGhUtVozt6p1qErvtAYvQ=">AAAChXichVG7SgNBFD2u7/hI1EawWQyKjWFWJIqNgo1lEk0i+Ai766iD+2J3EojB1sIfsLBSEBFt9Qds/AGLfIJYKthYeLNZERX1DjNz5sw9d87MGJ4lAslYvUVpbWvv6OzqjvX09vXHEwODhcAt+ybPm67l+quGHnBLODwvhbT4qudz3TYsXjT2Fhv7xQr3A+E6K7Lq8Q1b33HEtjB1SVQpoeY2a5q6LoXNA3XlAyx9gOJBKZFkKRaG+hNoEUgiioybuMA6tuDCRBk2OBxIwhZ0BNTWoIHBI24DNeJ8QiLc5zhAjLRlyuKUoRO7R+MOrdYi1qF1o2YQqk06xaLuk1LFGHtgl+yZ3bMr9sjefq1VC2s0vFRpNppa7pXiR8PLr/+qbJoldj9Vf3qW2MZs6FWQdy9kGrcwm/rK/vHz8lxurDbOztgT+T9ldXZHN3AqL+Z5ludOEKMP0L4/909QmEpp6dRUdjq5MBF9RRdGMIoJeu8ZLGAJGeTp3ENc4wa3SqcyqUwr6Waq0hJphvAllPl3owWU2w==</latexit>
R1⇥T ⇥H⇥W
<latexit sha1_base64="+GfDlrSGhUtVozt6p1qErvtAYvQ=">AAAChXichVG7SgNBFD2u7/hI1EawWQyKjWFWJIqNgo1lEk0i+Ai766iD+2J3EojB1sIfsLBSEBFt9Qds/AGLfIJYKthYeLNZERX1DjNz5sw9d87MGJ4lAslYvUVpbWvv6OzqjvX09vXHEwODhcAt+ybPm67l+quGHnBLODwvhbT4qudz3TYsXjT2Fhv7xQr3A+E6K7Lq8Q1b33HEtjB1SVQpoeY2a5q6LoXNA3XlAyx9gOJBKZFkKRaG+hNoEUgiioybuMA6tuDCRBk2OBxIwhZ0BNTWoIHBI24DNeJ8QiLc5zhAjLRlyuKUoRO7R+MOrdYi1qF1o2YQqk06xaLuk1LFGHtgl+yZ3bMr9sjefq1VC2s0vFRpNppa7pXiR8PLr/+qbJoldj9Vf3qW2MZs6FWQdy9kGrcwm/rK/vHz8lxurDbOztgT+T9ldXZHN3AqL+Z5ludOEKMP0L4/909QmEpp6dRUdjq5MBF9RRdGMIoJeu8ZLGAJGeTp3ENc4wa3SqcyqUwr6Waq0hJphvAllPl3owWU2w==</latexit>
R1⇥T ⇥H⇥W
4. 実験 17
表 1: UCF101 の検証セットに対する性能評価.MHA はマルチヘッドアテンションを
表す.Lper/attn のみの場合は元の ABN に相当する.エントロピーは,object がマス
ク損失を取るチャンネル M[:, 1, :, :],inverse が背景に対応するチャンネル M[:, 2, :, :],
記載なしがマスク損失を計算しないチャンネル M[:, 0 :, :, ] のものである.
entropy entropy entropy
Lper/attn Lmask MHA LPC top-1 top-5 object inverse
! 93.96 99.15 3.064
! ! 93.62 99.26 2.026
! ! 94.68 99.47 3.041
! ! ! 88.93 98.04 2.850 1.360 1.356
! ! ! ! 87.76 97.27 2.815 1.388 1.414
アテンションマップ比較
nABN
• まだらなアテンションマップ
nABN + インスタンスセグメン
テーション
• シャープなアテンションマップ
入力動画
特徴量シフト
Feature shift
軽量な画像認識のためのシフト
n 畳み込み
• 各層のフィルタリングにより特徴量を混ぜる
• フィルタリングの重みが学習パラメータ
n シフト
• 単なるシフト操作で特徴量を混ぜる
• 実際に混ぜるのはシフト後の1x1conv
• 学習するパラメータなし
• どの程度シフトするかはハイパラ
n 手法
• CNN
• Shift [Wu+, CVPR2018]
• ShiftResNet [Chen+, CVPR2019]
• AddresNet [He+, WACV2019]
• ViT
• ShiftViT [Wang+, AAAI2022]
[Chen+, CVPR2019]
By Aphex34- Own work, CC BY-SA 4.0
TSM
nTemporal Shift Module
n特徴量シフトを動画像に導入
• frame-wise 2D & Late fusion
• 2D ResNet-50をフレーム毎に適用
• 時間方向に平均集約
• 時間方向のモデル化
• 中間特徴量を前後の時刻t+1/t-1で
入れ替え(シフト)
• 固定カーネルの時間方向1D畳み
込みともみなせる
[Lin+, ICCV2019]
layer
layer
layer
layer
layer
layer
layer
layer
layer
late fusion
<latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit>
t
<latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit>
t + 1
<latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit>
t 1
layer
shift
layer
shift
layer
shift
layer
shift
layer
shift
layer
shift
layer
shift
layer
shift
layer
shift
late fusion
<latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit>
t
<latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit>
t + 1
<latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit>
t 1
TSMなし TSMあり
Temporal Cross-attention
for Action Recognition
Ryota Hashiguchi, Toru Tamaki
SSII2022/ACCVW2022
Feature shift for CNN and ViT
nTemporal Shift Module (TSM) [Lin+,
ICCV2019]
• 2D CNN layer features are temporally
shifted
nToken Shift Transformer (TokenShift)
[Zhang+, ACMMM2021]
• Shifting class tokens only
• Not fully exploiting spatio-temporal features
nProposed method: MSCA
• A new shift method based on the ViT
structure
• Multi-head Self+Cross Attention (MSCA)
[Lin+, ICCV2019]
[Zhang+, ACMMM2021]
ViT, TokenShift, and MSCA
ViT TokenShift Ours
Embedded
shift
Norm
MSA
Norm
MLP
shift
+
+
L x
Embedded
Norm
MSCA
Norm
MLP
+
+
L x
Embedded
Norm
MSA
Norm
MLP
+
+
L x
Multi-head Self+Cross Attention (MSCA)
nPartially insert Cross Attention with the time t-1 and t+1
V V
V Q K
Q K
Q K
<latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit>
t
<latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit>
t + 1
<latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit>
t 1
Embedde
d
Norm
MHA
Norm
MLP
+
+
L x
MSCA-KV
V
K
Q
shift
nshift K, V in head direction
nHeads=4
X
X
attention
<latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit>
t
<latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit>
t + 1
<latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit>
t 1
V
K
Q V
K
Q
V V
V Q K
Q K
Q K
Effect of the amount of head shifts
nMSCA-KV
nBest when only two heads shifted
nToo much shift hurts the performance
18 4. 実験
表 1: Kinetics400 の検証セットに対する MSCA-KV の性能への,ヘッド
ト量の影響.シフト量 0 は ViT に対応する.ヘッド数 h = 12 であるため
1/12 は 1 つのヘッドをシフトすることを意味し,1/6 はヘッド 2 つを意
shift heads top-1 top-5
0 (ViT) 0 75.65 92.19
1/12 1 76.47 92.88
1/6 2 76.07 92.61
1/4 3 75.66 92.30
1/3 4 74.72 91.91
素の部分をクロップする.さらに,それぞれ 10%の割合で,画像の明る
MSCA-KV
!
"
#
$%&'(
X
X
)((*+(&,+
<latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit>
t
<latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit>
t + 1
<latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit>
t 1
!
"
# !
"
#
Number of encoder blocks with MSCA
nReplaced MSA modules with MSCA
• 4, 8, 12 modules near the top of the network
• Replacing 12 modules is MSCA-KV
nUsing 8 or more MSCA modules
• Input video clip consists of 8 frame
• Shifting more than 8 times covers the
temporal information from all 8 frames of the
input video clip
表 3: MSA を MCSA に置き換える個数が性能に与える影響
12 は MSCA-KV,0 は ViT に相当する.
# MSCA top-1 top-5
0 (ViT) 75.65 92.19
4 75.67 92.19
8 76.40 92.77
12 76.47 92.88
データ拡張
Data augmentation
データ拡張
n画像認識用
• 回転,水平垂直反転,ノイズ
• Mix系の手法
• CutOut [DeVries&Taylor, arXiv2017]
• Mixup [Zhang+, ICLR2018]
• CutMix [Yun+, ICCV2019]
• CG合成
• FlyingThings3D [Mayer+, CVPR2016]
• オプティカルフロー推定用
• SURREAL [Varol+, CVPR2017]
• 人物姿勢推定用
n動画像
• 水平反転はしない場合がある
Cutout Mixup CutMix
we rendered all non-RGB data without antialiasing.
Given the intrinsic camera parameters (focal length,
principal point) and the render settings (image size, virtual
sensor size and format), we project the 3D motion vector
of each pixel into a 2D pixel motion vector coplanar to the
imaging plane: the optical flow. Depth is directly retrieved
from a pixel’s 3D position and converted to disparity using
the known configuration of the virtual stereo rig. We com-
FlyingThings3D
albumentations [Buslaev+,
Information, 2020]
imgaug
[Jung+, 2020]
SURREAL
VideoMix
nCutMix [Yun+, ICCV2019]の応用
• 3種類を提案
• S-VideoMix
• すべての時刻に同じ矩形部分を貼り付け
• T-VideoMix
• ある時刻区間に全画面を貼り付け
• ST-VideoMix
• 時空間ボリュームを貼り付け
[Yun+, arXiv2020]
top1 top5
75.2 91.7
77.0 93.1
5] 75.6 92.2
S-VideoMix T-VideoMix ST-VideoMix
Video B
Video A
time
height
ObjectMix
Data Augmentation by
Copy-Pasting Objects in Videos
for Action Recognition
Jun Kimata, Tomoya Nitta, Toru Tamaki
SSII2022/MMAsia2022
Data augmentation for action recognition
nInspired by: Copy-Paste [Ghiasi+, CVPR2021]
• Method for segmentation
• Cut out only instance and pastes it onto the other
nProposed method: ObjectMix
• Cut the object and pastes it onto another video
• Extract object from each frames
• Mix videos while considering object
Figure 2. We use a simple copy and paste method to create new images for training instance se
jittering on two random training images and then randomly select a subset of instances from o
The key idea behind the Copy-Paste augmentation is to
paste objects from one image to another image. This can
lead to a combinatorial number of new training data, with
multiple possibilities for: (1) choices of the pair of source
image from which instances are copied, and the target im-
age on which they are pasted; (2) choices of object instances
to copy from the source image; (3) choices of where to paste
the copied instances on the target image. The large variety
of options when utilizing this data augmentation method al-
lows for lots of exploration on how to use the technique
most effectively. Prior work [12, 15] adopts methods for de-
ciding where to paste the additional objects by modeling the
surrounding visual context. In contrast, we find that a sim-
ple strategy of randomly picking objects and pasting them at
random locations on the target image provides a significant
mentation [43] (48.
passes state-of-the-
EfficientDet-D7x-1
P7-1536 [61] (55.8
size of 1280 instead
Finally, we show
sults in better featu
typically used in th
Paste we get improv
rare and common ca
The Copy-Paste
into any instance s
labeled images effe
inference overhead
Mask-RCNN show
Source images
Figure 2. We use a simple copy and paste method to create new images for train
jittering on two random training images and then randomly select a subset of ins
The key idea behind the Copy-Paste augmentation is to
paste objects from one image to another image. This can
mentat
passes
Figure 2. We use a simple copy and paste method to create new images for training instance segm
jittering on two random training images and then randomly select a subset of instances from one
The key idea behind the Copy-Paste augmentation is to
paste objects from one image to another image. This can
lead to a combinatorial number of new training data, with
multiple possibilities for: (1) choices of the pair of source
image from which instances are copied, and the target im-
age on which they are pasted; (2) choices of object instances
to copy from the source image; (3) choices of where to paste
the copied instances on the target image. The large variety
of options when utilizing this data augmentation method al-
lows for lots of exploration on how to use the technique
most effectively. Prior work [12, 15] adopts methods for de-
ciding where to paste the additional objects by modeling the
surrounding visual context. In contrast, we find that a sim-
ple strategy of randomly picking objects and pasting them at
random locations on the target image provides a significant
boost on top of baselines across multiple settings. Specif-
ically, it gives solid improvements across a wide range of
mentation [43] (48.5 m
passes state-of-the-art
EfficientDet-D7x-1536
P7-1536 [61] (55.8 bo
size of 1280 instead of
Finally, we show th
sults in better features
typically used in the L
Paste we get improvem
rare and common cate
The Copy-Paste au
into any instance seg
labeled images effecti
inference overheads.
Mask-RCNN show th
training, and without
ily improved, e.g., by +
Generated
images
Source videos Generated videos
Proposed method: ObjectMix
nAlgorithm
1. Preparing two source videos
2. Extracting object regions from each video (Detectron2 [Wu+, 2019])
3. Cut out the object using the extracted masks
4. The cut object is pasted to the other side
Videos Masks Objects Generated
Experiment 1: ObjectMix
nHighest performance at about p=0.6
nToo large p reduces performance.
np=0 is higher in the early stages,
but becomes equal or higher
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding

More Related Content

Similar to 動画像理解のための深層学習アプローチ Deep learning approaches to video understanding

People detection in a video
People detection in a videoPeople detection in a video
People detection in a videoYonatan Katz
 
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...Edge AI and Vision Alliance
 
TAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AITAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AIYi-Shin Chen
 
Video Compression Advanced.pdf
Video Compression Advanced.pdfVideo Compression Advanced.pdf
Video Compression Advanced.pdfSMohiuddin1
 
Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Wesley De Neve
 
CG OpenGL line & area-course 3
CG OpenGL line & area-course 3CG OpenGL line & area-course 3
CG OpenGL line & area-course 3fungfung Chen
 
Video Classification: Human Action Recognition on HMDB-51 dataset
Video Classification: Human Action Recognition on HMDB-51 datasetVideo Classification: Human Action Recognition on HMDB-51 dataset
Video Classification: Human Action Recognition on HMDB-51 datasetGiorgio Carbone
 
When Discrete Optimization Meets Multimedia Security (and Beyond)
When Discrete Optimization Meets Multimedia Security (and Beyond)When Discrete Optimization Meets Multimedia Security (and Beyond)
When Discrete Optimization Meets Multimedia Security (and Beyond)Shujun Li
 
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術SSII
 
Learning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksLearning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksSungminYou
 
Cycle-Contrast for Self-Supervised Video Represenation Learning
Cycle-Contrast for Self-Supervised Video Represenation LearningCycle-Contrast for Self-Supervised Video Represenation Learning
Cycle-Contrast for Self-Supervised Video Represenation LearningQuan Kong
 
Dance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platformDance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platformIRJET Journal
 
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNsTemporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNsUniversitat Politècnica de Catalunya
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Provectus
 
최신 3차원 이미지 스캔 역설계 기술 전망 및 건설 활용
최신 3차원 이미지 스캔 역설계 기술 전망 및 건설 활용최신 3차원 이미지 스캔 역설계 기술 전망 및 건설 활용
최신 3차원 이미지 스캔 역설계 기술 전망 및 건설 활용Tae wook kang
 
Shot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion ActivitiesShot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion ActivitiesCSCJournals
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfssuserb4d806
 

Similar to 動画像理解のための深層学習アプローチ Deep learning approaches to video understanding (20)

People detection in a video
People detection in a videoPeople detection in a video
People detection in a video
 
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
 
TAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AITAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AI
 
Video Compression Advanced.pdf
Video Compression Advanced.pdfVideo Compression Advanced.pdf
Video Compression Advanced.pdf
 
Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...
 
Neural Architectures for Video Encoding
Neural Architectures for Video EncodingNeural Architectures for Video Encoding
Neural Architectures for Video Encoding
 
CG OpenGL line & area-course 3
CG OpenGL line & area-course 3CG OpenGL line & area-course 3
CG OpenGL line & area-course 3
 
med_poster_spie
med_poster_spiemed_poster_spie
med_poster_spie
 
NMSL_2017summer
NMSL_2017summerNMSL_2017summer
NMSL_2017summer
 
Video Classification: Human Action Recognition on HMDB-51 dataset
Video Classification: Human Action Recognition on HMDB-51 datasetVideo Classification: Human Action Recognition on HMDB-51 dataset
Video Classification: Human Action Recognition on HMDB-51 dataset
 
When Discrete Optimization Meets Multimedia Security (and Beyond)
When Discrete Optimization Meets Multimedia Security (and Beyond)When Discrete Optimization Meets Multimedia Security (and Beyond)
When Discrete Optimization Meets Multimedia Security (and Beyond)
 
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
 
Learning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksLearning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networks
 
Cycle-Contrast for Self-Supervised Video Represenation Learning
Cycle-Contrast for Self-Supervised Video Represenation LearningCycle-Contrast for Self-Supervised Video Represenation Learning
Cycle-Contrast for Self-Supervised Video Represenation Learning
 
Dance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platformDance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platform
 
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNsTemporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
 
최신 3차원 이미지 스캔 역설계 기술 전망 및 건설 활용
최신 3차원 이미지 스캔 역설계 기술 전망 및 건설 활용최신 3차원 이미지 스캔 역설계 기술 전망 및 건설 활용
최신 3차원 이미지 스캔 역설계 기술 전망 및 건설 활용
 
Shot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion ActivitiesShot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion Activities
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdf
 

More from Toru Tamaki

論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...Toru Tamaki
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...Toru Tamaki
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNetToru Tamaki
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A surveyToru Tamaki
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex ScenesToru Tamaki
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...Toru Tamaki
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video SegmentationToru Tamaki
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New HopeToru Tamaki
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...Toru Tamaki
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt TuningToru Tamaki
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in MoviesToru Tamaki
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICAToru Tamaki
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context RefinementToru Tamaki
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...Toru Tamaki
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...Toru Tamaki
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusionToru Tamaki
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous DrivingToru Tamaki
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large MotionToru Tamaki
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense PredictionsToru Tamaki
 
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation LearningToru Tamaki
 

More from Toru Tamaki (20)

論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions
 
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

動画像理解のための深層学習アプローチ Deep learning approaches to video understanding

  • 1. 動画像理解のための 深層学習アプローチ Deep learning approaches to video understanding 玉木徹(名工大) 2023/12/14
  • 2. 動作認識とは n動画像理解タスクの一つ n人間の動作を認識する (Action Recognition, AR) • 動き:motion • 動作:action, activity • 行動:behavior n動画像の識別 • 「人間が動作している映像」に 限らない • 人間に限定する手法もある • 人物検出や姿勢推定を併用 モデル カテゴリ 画像 モデル カテゴリ 画像認識 動作認識 cats and dogs dataset 2xxcpLQHZf8_000002_000012.mp4 Kinetics400 [Kay+, arXiv2017]
  • 3. 動画像=静止フレームの時系列 n入力:動画像 • 時間方向の次元が増える • 時間情報や動き情報のモデ ル化が必要 • temporal modeling 時間 動画像 フレーム 2xxcpLQHZf8_000002_000012.mp4 Kinetics400 [Kay+, arXiv2017]
  • 5. Action recognition datasets KTH Weizmann UCF11 UCF50 UCF101 UCF101-24 (THUMOS13) THUMOS14 THUMOS15 2005 2010 2012 2015 2017 2019 2020 Kinetics 400 Kinetics 600 Kinetics 700 Kinetics 700 -2020 ActivityNet v1.3 MiT Multi THUMOS Multi-MiT HVU AVA Actions AVA-Kinetics HACS Segment SSv1 HMDB51 JHMDB21 Charades Action Genome Home Action Genome Visual Genome Transformer Ego-4D ActivityNet v1.2 SSv2 Charades-Ego Jester HAA500 FineGym Hollywood2 MPII Composites MPII Cooking 2 Diving48 2021 2022 2016 2018 2013 2014 2011 2006 2007 2008 2009 2004 Action label(s) per video/clip Temporal annotation Spatio-Temporal annotation HACS Clip MPII Cooking YouYube-8M HowTo 100M ImageNet AlexNet ResNet U-Net EPIC- KITCHENS55 EPIC- KITCHENS100 SURF FAST ORB GAN ViT HOG YouYube-8M Segments ver2016 ver2017 ver2018 Hollywood Olympic Sports IXMAS DALY Sports-1M UCF Sports machine generated labels multi labels untrimmed YouTube videos worker/lab videos YouCook2 YouCook Hollywood -Extended Breakfast 50Salads Coffee Cigarettes ActivityNet Entities Kinetics 100 Mini-Kinetics 200
  • 6. HMDB51 nHuman Motion DataBase • ソース:Digitized movies, Prelinger archive, YouTube and Google videos, etc • 既存のUCF-SportsやOlympicSportsは ソースがYoutubeのみ,アクションが 曖昧,人の姿勢で分かってしまう n 51カテゴリ,6766動画 • 各カテゴリ最低101 • 1〜5秒程度,平均3.15秒程度 • min 0.63秒,max 35.43秒 [Jhuang+, ICCV2011] Karlsruhe, Germany kuehne@kit.edu Cambridge, MA 02139 hueihan@mit.edu, tp@ai.mit.edu Providence, RI 02906 thomas serre@brown.edu Abstract With nearly one billion online videos viewed everyday, an emerging new frontier in computer vision research is recognition and search in video. While much effort has been devoted to the collection and annotation of large scal- able static image datasets containing thousands of image categories, human action datasets lag far behind. Cur- rent action recognition databases contain on the order of ten different action categories collected under fairly con- trolled conditions. State-of-the-art performance on these datasets is now near ceiling and thus there is a need for the design and creation of new benchmarks. To address this is- sue we collected the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube. We use this database to evaluate the performance of two represen- tative computer vision systems for action recognition and explore the robustness of these methods under various con- ditions such as camera motion, viewpoint, video quality and occlusion. Figure 1. Sample frames from the proposed HMDB51 [1] (from top left to lower right, actions are: hand-waving, drinking, sword
  • 7. UCF101 nUniversity of Central Florida • ソースはYouTube • 手動でクリーニング • UCF-Sports/11/50の後継 n101カテゴリ,13,320動画 • 最短1.06秒,最長71.04秒,平 均7.21秒 [Soomro+, arXiv, 2012]
  • 8. Kinetics nthe DeepMind Kinetics human action video dataset • ソースはYoutube • K400: train 22k, val 18k, test 35k • 動画は最長10秒 • top1とtop5で評価 n種類 • Kinetics-400 [Kay+, arXiv2017] • Kinetics-600 [Carreira+, arXiv2018] • Kinetics-700 [Carreira+, arXiv2019] • Kinetics-700-2020 [Smaira+, arXiv2020] nポリシー:1動画から1クリップ • HMDBやUCFは1動画から複数クリップ [Kay+, arXiv2017] (a) headbanging (c) shaking hands (e) robot dancing
  • 9. SSv2 nsomething-something v2 • アクションラベルではなく,名詞・ 動詞のパターンを理解するべき • 174のテンプレート文174=ラベル • "Dropping [something] into [something]" • "Stacking [number of] [something]" • 「something」 • アクション対象の物体名が入る プレースホルダー • 221k動画 • train 167k, val 25k, test 27k • 平均4.03秒 (v1) v1 [Goyal+, ICCV2017] v2 [Mahdisoltani+, arXiv2018] Putting a white remote into a cardboard box Pretending to put candy onto chair Pushing a green chilli so that it falls off the table Moving puncher closer to scissor Figure 4: Example videos and corresponding descriptions. Object entries shown in italics.
  • 11. Action Recognition models 2015 2017 2019 2020 2021 2022 2016 2018 2013 2014 restricted 3D Full 3D DT IDT Two Stream TSN C3D I3D P3D S3D R(2+1)D 3D ResNet Non-Local TSM SlowFast X3D ViVit TimeSformer STAM Video Transformer Network VidTr X-ViT 2D + 1D aggregation (2+1)D (2+1)D CNN Non-Deep Vision Transformer 2D + 1D aggregation R3D Transformer Kinetics ResNet U-Net GAN ViT 2012 ImageNet TokenShift VideoSwin
  • 12. Fusion:フレーム毎に2D CNNを適用 n2D CNNを動画像に適用する方法 • single:1フレームにだけ使用 • late fusion:各フレームに2D CNNを適用, 最後に時間方向に集約 • early fusion:複数フレームを一度に入力 • TSN • slow fusion:ネットワークの途中で徐々に統 合していく(lateral fusion) n考え方,用語は受け継がれている • late fusionは健在 • フレーム毎に適用,単純平均・1D CNN・ Transformerなどで集約 • ベースラインとして利用 [Karpathy+, CVPR2014] Figure 1: Explored approaches for fusing information over temporal dimension through the network. Red, green and blue boxes indicate convolutional, normalization and pool- ing layers respectively. In the Slow Fusion model, the de- picted columns share parameters. 3.1. Time Information Fusion in CNNs in t fram con by c S mix info ers bot by in t to s [1, exte an
  • 13. Two-Stream network nRGBとオプティカルフローを利用 • それぞれが2D CNN • RGB:フレーム1枚 • フロー:複数フレームのスタック • フローをつなげたtrajectoryも試した が性能悪い nフローの利用 • これ以降,RGBフレームを用いる CNNモデルは「最後に,フローを追 加すると性能ブースト」を示すことに なる n大量の派生手法あり [Simonyan&Zisserman, NIPS2014] conv1 7x7x96 stride 2 norm. pool 2x2 conv2 5x5x256 stride 2 norm. pool 2x2 conv3 3x3x512 stride 1 conv4 3x3x512 stride 1 conv5 3x3x512 stride 1 pool 2x2 full6 4096 dropout full7 2048 dropout softmax conv1 7x7x96 stride 2 norm. pool 2x2 conv2 5x5x256 stride 2 pool 2x2 conv3 3x3x512 stride 1 conv4 3x3x512 stride 1 conv5 3x3x512 stride 1 pool 2x2 full6 4096 dropout full7 2048 dropout softmax Spatial stream ConvNet Temporal stream ConvNet single frame input video multi-frame optical flow class score fusion
  • 14. TSN nTemporal Segment Networks • 単純な工夫 • クリップを分割 • それぞれでtwo-stream • 最後に統合 • 長期の時間的モデリング • フローは短期の時間的モデリング [Wang+, ECCV2016] TSNs: Towards Good Practices for Deep Action Recognition 25
  • 15. (2+1)D CNN n3D Convの計算量削減 • 空間方向2D convと時間方向の1D convを組み合わせる nSeparable convの代表例 • P3D [Qiu+, ICCV2017] • S3D [Xie+, ECCV2018] • R(2+1)D [Tran+, CVPR2018] • R3Dも比較(3D ResNetのこと) nモジュールの中で2D/1D conv • P3D, S3D n層ごとに2Dと3Dを分ける • S3D,MCx/rMCx t x d x d 1 x d x d t x 1 x 1 Mi a) b) Figure 2. (2+1)D vs 3D convolution. The illustration is given for the simplified setting where the input consists of a spatiotemporal volume with a single feature channel. (a) Full 3D convolution is carried out using a filter of size t × d × d where t denotes the tem- poral extent and d is the spatial width and height. (b) A (2+1)D convolutional block splits the computation into a spatial 2D con- volution followed by a temporal 1D convolution. We choose the numbers of 2D filters (Mi) so that the number of parameters in our (2+1)D block matches that of the full 3D convolutional block. using 2D convolutions in the top layers. Since in this work we consider 3D ResNets (R3D) having 5 groups of convo- lutions (see Table 1), our first variant consists in replacing all 3D convolutions in group 5 with 2D convolutions. We denote this variant with MC5 (Mixed Convolutions). We design a second variant that uses 2D convolutions in group 4 and 5, and name this model MC4 (meaning from group 4 and deeper layers all convolutions are 2D). Following this pattern, we also create MC3 and MC2 variations. We omit to consider MC1 since it is equivalent to a 2D ResNet (f- 0 10 20 30 40 50 epoch 0 0.2 0.4 0.6 0.8 1 error (%) R3D-18 train R3D-18 val R(2+1)D-18 train R(2+1)D-18 val 0 10 20 30 40 50 epoch 0 0.2 0.4 0.6 0.8 1 error (%) R3D-34 train R3D-34 val R(2+1)D-34 train R(2+1)D-34 val Figure 3. Training and testing errors for R(2+1)D and R3D. Results are reported for ResNets of 18 layers (left) and 34 layers (right). It can be observed that the training error (thin lines) is smaller for R(2+1)D compared to R3D, particularly for the net- work with larger depth (right). This suggests that the the spatial- temporal decomposition implemented by R(2+1)D eases the opti- mization, especially as depth is increased. the temporal convolutions. We choose Mi = ⌊ td2 Ni−1Ni d2Ni−1+tNi ⌋ so that the number of parameters in the (2+1)D block is approximately equal to that implementing full 3D convolu- tion. We note that this spatiotemporal decomposition can be applied to any 3D convolutional layer. An illustration of this decomposition is given in Figure 2 for the simplified setting where the input tensor zi−1 contains a single channel (i.e., Ni−1 = 1). If the 3D convolution has spatial or tem- poral striding (implementing downsampling), the striding is correspondingly decomposed into its spatial or temporal di- mensions. This architecture is illustrated in Figure 1(e). Compared to full 3D convolution, our (2+1)D decom- position offers two advantages. First, despite not changing R(2+1)D [Tran+, CVPR2018] 2D Inc. Conv 1x3x3 Conv 1x1x1 1x3x3 Max-Pool Conv 1x1x1 Previous Layer Conv 1x1x1 Conv 1x1x1 Next Layer Concat Conv 1x3x3 (a) 3D Inc. Conv 3x3x3 Conv 1x1x1 3x3x3 Max-Pool Conv 1x1x1 Previous Layer Conv 1x1x1 Conv 1x1x1 Next Layer Concat Conv 3x3x3 (b) Sep-Inc. Conv 1x3x3 Conv 3x1x1 Conv 3x1x1 Conv 1x1x1 3x3x3 Max-Pool Conv 1x1x1 Previous Layer Conv 1x1x1 Conv 1x1x1 Next Layer Concat Conv 1x3x3 (c) Fig. 3. (a) 2D Inception block; (b) 3D Inception block; (c) 3D temporal separable Inception block used in S3D networks. S3D [Xie+, ECCV2018] 7,7,7 Conv Stride 2 1x3x3 Max-Pool Stride 1,2,2 1x1x1 Conv 1x1x1 Conv 3x3x3 Conv 1x3x3 Max-Pool Stride 1,2,2 3D Inc. 3D Inc. 3x3x3 Max-Pool Stride 2 3D Inc. 3D Inc. 3D Inc. 3D Inc. 3D Inc. 3D Inc. 3D Inc. 2x2x2 Max-Pool Stride 2 2x7x7 Avg-Pool Video (64 Frames) Prediction (400D) (a) I3D 1,7,7 Conv Stride 2 1x3x3 Max-Pool Stride 1,2,2 1x1x1 Conv 1x1x1 Conv 1x3x3 Conv 1x3x3 Max-Pool Stride 1,2,2 2D Inc. 2D Inc. 3x3x3 Max-Pool Stride 2,2,2 2D Inc. 2D Inc. 2D Inc. 2D Inc. 2D Inc. 2D Inc. 2D Inc. 2x2x2 Max-Pool Stride 2,2,2 1x7x7 Avg-Pool Video Prediction (b) I2D 7,7,7 Conv Stride 2 1x3x3 Max-Pool Stride 1,2,2 1x1x1 Conv 1x1x1 Conv 3x3x3 Conv 1x3x3 Max-Pool Stride 1,2,2 3D Inc. 3D Inc. 3x3x3 Max-Pool Stride 2,2,2 3D Inc. 3D Inc. 2D Inc. 2D Inc. 2D Inc. 2D Inc. 2D Inc. 2x2x2 Max-Pool Stride 2,2,2 2x7x7 Avg-Pool Video Prediction K=0 K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 (c) Bottom-heavy I3D 1,7,7 Conv Stride 2 1x3x3 Max-Pool Stride 1,2,2 1x1x1 Conv 1x1x1 Conv 1x3x3 Conv 1x3x3 Max-Pool Stride 1,2,2 2D Inc. 2D Inc. 3x3x3 Max-Pool Stride 2,2,2 2D Inc. 2D Inc. 3D Inc. 3D Inc. 3D Inc. 3D Inc. 3D Inc. 2x2x2 Max-Pool Stride 2,2,2 2x7x7 Avg-Pool Prediction Video K=0 K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 (d) Top-heavy I3D Fig. 2. Network architecture details for (a) I3D, (b) I2D, (c) Bottom-Heavy and (d) Top-Heav variants. K indexes the spatio-temporal convolutional layers. The “2D Inc.” and “3D Inc.” block refer to 2D and 3D inception blocks, defined in Figure 3. 2D Inc. Conv 1x3x3 Conv 1x1x1 1x3x3 Max-Pool Conv 1x1x1 Previous Layer Conv 1x1x1 Conv 1x1x1 Next Layer Concat Conv 1x3x3 (a) 3D Inc. Conv 3x3x3 Conv 1x1x1 3x3x3 Max-Pool Conv 1x1x1 Previous Layer Conv 1x1x1 Conv 1x1x1 Next Layer Concat Conv 3x3x3 (b) Sep-Inc. Conv 1x3x3 Conv 3x1x1 Conv 3x1x1 Conv 1x1x1 3x3x3 Max-Pool Conv 1x1x1 Previous Layer Conv 1x1x1 Conv 1x1x1 Next Layer Concat Conv 1x3x3 (c) Fig. 3. (a) 2D Inception block; (b) 3D Inception block; (c) 3D temporal separable Inception bloc used in S3D networks. 2D conv 2D conv 2D conv 2D conv 2D conv fc (a) R2D 3D conv 3D conv 3D conv 3D conv 3D conv fc (d) R3D (2+1)D conv fc (2+1)D conv (2+1)D conv (2+1)D conv (2+1)D conv (e) R(2+1)D 2D conv 2D conv 2D conv 3D conv 3D conv fc (b) MC 3D conv 3D conv 3D conv 2D conv 2D conv fc (c) rMC x x space-time pool space-time pool space-time pool space-time pool space-time pool clip clip clip clip clip Figure 1. Residual network architectures for video classification considered in this work. (a) R2D are 2D ResNets; (b) MCx are ResNets with mixed convolutions (MC3 is presented in this figure); (c) rMCx use reversed mixed convolutions (rMC3 is shown here); (d) R3D are 3D ResNets; and (e) R(2+1)D are ResNets with (2+1)D convolutions. For interpretability, residual connections are omitted. 3. Convolutional residual blocks for video In this section we discuss several spatiotemporal convo- lutional variants within the framework of residual learning. Let x denote the input clip of size 3×L×H ×W, where L is the number of frames in the clip, H and W are the frame height and width, and 3 refers to the RGB channels. Let zi be the tensor computed by the i-th convolutional block in the residual network. In this work we consider only “vanilla” residual blocks (i.e., without bottlenecks) [13], with each block consisting of two convolutional layers with a ReLU activation function after each layer. Then the output of the i-th residual block is given by zi = zi−1 + F(zi−1; θi) (1) where F(; θi) implements the composition of two convo- lutions parameterized by weights θi and the application of the ReLU functions. In this work we consider networks dimensions of the preceding tensor zi−1. Each filter yields a single-channel output. Thus, the very first convolutional layer in R2D collapses the entire temporal information of the video in single-channel feature maps, which prevent any temporal reasoning to happen in subsequent layers. This type of CNN architecture is illustrated in Figure 1(a). Note that since the feature maps have no temporal meaning, we do not perform temporal striding for this network. 3.2. f-R2D: 2D convolutions over frames Another 2D CNN approach involves processing indepen- dently the L frames via a series of 2D convolutional resid- ual block. The same filters are applied to all L frames. In this case, no temporal modeling is performed in the convo- lutional layers and the global spatiotemporal pooling layer at the top simply fuses the information extracted indepen- dently from the L frames. We refer to this architecture vari- ant as f-R2D (frame-based R2D). P3D [Qiu+, ICCV2017] (a) Residual Unit [7] + 1x1x1 conv 1x1x1 conv 1x3x3 conv ReLU ReLU 3x1x1 conv ReLU ReLU (b) P3D-A + 1x1x1 conv 1x1x1 conv ReLU ReLU 1x3x3 conv 3x1x1 conv + ReLU ReLU (c) P3D-B + 1x1x1 conv 1x1x1 conv 1x3x3 conv ReLU ReLU 3x1x1 conv ReLU + ReLU (d) P3D-C Figure 3. Bottleneck building blocks of Residual Unit and our Pseudo-3D. shortcut connection from S to the final output, making the output x as Table 1. Comparisons of ResNet-50 and different Pseudo-3D ResNet variants in terms of model size, speed, and accuracy on Table 2. Comparisons in terms of pre-train data, clip length, Top-1 clip Method Pre-train Data Clip L Deep Video (Single Frame) [10] ImageNet1K Deep Video (Slow Fusion) [10] ImageNet1K 1 Convolutional Pooling [37] ImageNet1K 1 C3D [31] – 1 C3D [31] I380K 1 ResNet-152 [7] ImageNet1K P3D ResNet (ours) ImageNet1K 1 P3D-A P3D-B P3D-C P3D-A P3D-B P3D-C ...
  • 16. 3D ResNet nResNetの3D版 • 大規模なKineticsで学習すれば層の多い3D ResNetで性能が出ることを示した n補足 • 論文名から内容とモデル名が分かりにくい • そのためあまり引用されてない...? [Hara+, CVPR2018]
  • 17. X3D n最適なアーキテクチャを探す • NAS (Network Architecture Search)によ る探索 • SlowFastのFast(ResNet)をベース • 複数のパラメータを変更 • 空間・時間解像度,チャネル数など • greedyに探索 • 規模の異なる複数のX3Dを提案 • かなり軽量 [Feichtenhofer, CVPR2020] τ s d b t w X3D d d t t b τ τ X3D-L X3D-M X3D-S X3D-XS s s s X2D w X3D-XL Model capacity in GFLOPs (# of multiply-adds x 109 ) 0 5 15 25 35 10 20 30 80 75 70 65 60 55 50 Kinetics top-1 accuracy (%) Figure 2. Progressive network expansion of X3D. The X2D base st nd model to X3D-XS 6 X3D-S 7 X3D-M 7 X3D-L 7 X3D-XL 7 Table 2. Expand used. We show t as computationa operations, in # Inference-time c as a fixed numb 4.1. Expande The accura sion process o from X2D tha axis) with 1.63 zontal axis), w step. We use 1 model top-1 top-5 regime FLOPs Params FLOPs (G) (G) (M) X3D-XS 68.6 87.9 X-Small ≤ 0.6 0.60 3.76 X3D-S 72.9 90.5 Small ≤ 2 1.96 3.76 X3D-M 74.6 91.7 Medium ≤ 5 4.73 3.76 X3D-L 76.8 92.5 Large ≤ 20 18.37 6.08 X3D-XL 78.4 93.6 X-Large ≤ 40 35.84 11.0 Table 2. Expanded instances on K400-val. 10-Center clip testing is
  • 18. フレーム毎にVision Transformer+late fusion n(2+1)D Transformer • STAM [Sharir+, arXiv2021] • Video Transformer Network [Neimark+, ICCVW2021] nVidTr [Zhang+, ICCV2021] • Transformerブロック内で(2+1)Dアテン ション nX-ViT [Bulat+, NeurIPS2021] • アテンションを前後フレームに制限 (a) Full space-time atten- tion: O(T2 S2 ) (b) Spatial-only attention: O(TS2 ) (c) TimeSformer [3] and ViViT (Model 3) [1]: O(T2 S + TS2 ) (d) Ours: O(TS2 ) X-ViT [Bulat+, NeurIPS2021] Video Transformer Network Daniel Neimark Omri Bar Maya Zohar Dotan Asselmann Theator {danieln, omri, maya, dotan}@theator.io Abstract This paper presents VTN, a transformer-based frame- work for video recognition. Inspired by recent developments in vision transformers, we ditch the standard approach in video action recognition that relies on 3D ConvNets and introduce a method that classifies actions by attending to the entire video sequence information. Our approach is generic and builds on top of any given 2D spatial network. In terms of wall runtime, it trains 16.1⇥ faster and runs 5.1⇥ faster during inference while maintaining competitive accuracy compared to other state-of-the-art methods. It enables whole video analysis, via a single end-to-end pass, while requiring 1.5⇥ fewer GFLOPs. We report competitive results on Kinetics-400 and Moments in Time benchmarks and present an ablation study of VTN properties and the trade-off between accuracy and inference speed. We hope our approach will serve as a new baseline and start a fresh line of research in the video recognition domain. Code and models are available at: https://github.com/bomri/SlowFast/blob/ master/projects/vtn/README.md. 1. Introduction Attention matters. For almost a decade, ConvNets have ruled the computer vision field [22, 7]. Applying deep ConvNets produced state-of-the-art results in many visual recognition tasks, i.e., image classification [32, 19, 34], ob- ject detection [17, 16, 28], semantic segmentation [25], ob- ject instance segmentation [18], face recognition [33, 30] and video action recognition [9, 38, 3, 39, 23, 14, 13, 12]. But, recently this domination is starting to crack as transformer-based models are showing promising results in many of these tasks [10, 2, 35, 40, 42, 15]. Video recognition tasks also rely heavily on ConvNets. In order to handle the temporal dimension, the fundamen- tal approach is to use 3D ConvNets [5, 3, 4]. In contrast to other studies that add the temporal dimension straight from the input clip level, we aim to move apart from 3D net- works. We use state-of-the-art 2D architectures to learn the spatial feature representations and add the temporal infor- Figure 1. Video Transformer Network architecture. Connecting three modules: A 2D spatial backbone (f(x)), used for feature ex- traction. Followed by a temporal attention-based encoder (Long- former in this work), that uses the feature vectors ( i) combined with a position encoding. The [CLS] token is processed by a clas- sification MLP head to get the final class prediction. mation later in the data flow by using attention mechanisms on top of the resulting features. Our approach input only RGB video frames and without any bells and whistles (e.g., optical flow, streams lateral connections, multi-scale infer- ence, multi-view inference, longer clips fine-tuning, etc.) achieves comparable results to other state-of-the-art mod- els. Video recognition is a perfect candidate for Transform- ers. Similar to language modeling, in which the input words or characters are represented as a sequence of tokens [37], videos are represented as a sequence of images (frames). However, this similarity is also a limitation when it comes to processing long sequences. Like long documents, long videos are hard to process. Even a 10 seconds video, such as those in the Kinetics-400 benchmark [21], are processed in recent studies as short, ˜2 seconds, clips. But how does this clip-based inference would work on much longer videos (i.e., movie films, sports events, or sur- gical procedures)? It seems counterintuitive that the infor- 3163 Video Transformer Network [Neimark+, CVPRW2021] STAM [Sharir+, arXiv2021] Figure 1: Spatio-temporal separable-attention video trans- former (VidTr). The model takes pixels patches as input and 3.2. VidTr In Table 2 pable of learn cal patches. H tention matrix stored in mem ory consumpt length. We ca creases memo to O(T2 W2 H ing, which ma vices. We no attention archi 3.2.1 Separa To address the VidTr [Zhang+, ICCV2021] Figure 3. Our proposed Transformer Network for video Our goal is to provide a model that can utilize sparsely subsampled temporal data for accurate predictions. Such model need to be able to capture long-term dependencies as well. While 2D convolutions filters are tailor-made for the structure of images, utilizing local connections and pro- viding desired properties for object recognition and detec- tion [16], the same properties might negatively affect the processing of subsampled temporal data. While a series of 3D-convolutions can learn long-term interactions due to in- creased receptive field, they are biased towards local ones. In order to verify this, we conducted an experiment: we fed leading methods that are based on 3D convolutions with the same subsampled data as in our method. The results are presented in Table 5. The performance of both methods de- graded significantly, the error of X3D increased by 23% and SlowFast error by 50%. Transformers offer advantages over their convolutional counterparts regarding modeling long-term dependencies. While a multi-head self-attention layer with sufficient num- attention applied to the sequence of fr tors. This separation between the spatia tion components has several advantage the computation by breaking down the two shorter sequences. In the first stag pared to N other patches within a fram compares each frame embedding vecto resulting in less overall computation t patch to NF other patches. The second advantage stems from th temporal information is better exploite abstract) level of the network. In ma 2D and 3D convolutions are used in th 3D components are only used on the t same reasoning, we apply the tempora embeddings rather than on individual level representations provide more sens in a video compared to individual patc Input embeddings. The input to transformer is X 2 RH⇥W ⇥3⇥F consi of size H ⇥W sampled from the origin in this input block is first divided i patches. For a frame of size H⇥W, we patches of size P ⇥ P. These patches are flattened into vec jected into an embedding vector: z (0) (p,t) = Ex(p,t) + ep ( where input vector x(p,t) 2 R3P 2 , an z(p,t) 2 RD are related by a learnable p vector epos (p,t), and matrix E. The ind patch and frame index, respecitvely wi t = 1, . . . , F. In order to use the Tr classification, a learnable classificatio the first position in the embedding se As will be shown, this classification t encode the information from each fra temporally across the sequence of fram we include a separate classification tok the sequence z (0) (0,t).
  • 19. ViViT nModel 1:3D ViT • ViTへのパッチを3Dにする nModel 2:late fusion,(2+1)D • フレーム毎に2D Transformer • 時間方向に1D Transformer ! " # … 1 C L S N Positional + Token Embedding Temporal + Token Embedding Embed to tokens … 1 N 2 … 1 N … T Temporal Transformer Encoder MLP Head Class … C L S 1 0 0 C L S 0 C L S 0 Spatial Transformer Encoder Spatial Transformer Encoder Spatial Transformer Encoder [Arnab+, ICCV2021]
  • 20. TimeSformer nアテンションを制限 • Divided attention n(2+1)D • 空間:同じフレーム内の パッチ • 時間:異なるフレーム同 じ位置のパッチ [Bertasius+, ICML2021] Is Space-Time Attention All You Need for Video Understanding? <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> Time Att. Space Att. <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> MLP <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> Space Att. MLP <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> Joint Space-Time Att. <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> Space Attention (S) Joint Space-Time Attention (ST) Divided Space-Time Attention (T+S) <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> Time Att. Width Att. <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> MLP <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> Height Att. <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> <latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit> Axial Attention (T+W+H) Sparse Local Global Attention (L+G) MLP Local Att. Global Att. MLP z(`) <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> z(` 1) <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> z(`) <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> z(`) <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> z(`) <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> z(`) <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> <latexit sha1_base64="vr9M27hwkmJn9HyewcyBFvyrmCg=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahXkoigh6LXjxWsB/QxLLZTtqlm03Y3RRqyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgoQzpR3n2yqtrW9sbpW3Kzu7e/sH9uFRW8WppNCiMY9lNyAKOBPQ0kxz6CYSSBRw6ATj25nfmYBULBYPepqAH5GhYCGjRBupb9uZF4T4KX/Mah5wfp737apTd+bAq8QtSBUVaPbtL28Q0zQCoSknSvVcJ9F+RqRmlENe8VIFCaFjMoSeoYJEoPxsfnmOz4wywGEsTQmN5+rviYxESk2jwHRGRI/UsjcT//N6qQ6v/YyJJNUg6GJRmHKsYzyLAQ+YBKr51BBCJTO3YjoiklBtwqqYENzll1dJ+6LuOnX3/rLauCniKKMTdIpqyEVXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaC1Zxcwx+gPr8wcJUZNB</latexit> z(` 1) <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> z(` 1) <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> z(` 1) <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> z(` 1) <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> <latexit sha1_base64="PqFOkq34IvzyoSmk0/rjIYa/Lb0=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSxCPVgSEfRY9OKxgv2ANpbNdtIu3WzC7kasIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82POlHacb6uwsrq2vlHcLG1t7+zu2fvllooSSaFJIx7Jjk8UcCagqZnm0IklkNDn0PbH11O//QBSsUjc6UkMXkiGggWMEm2kvl1Oe36An7L7tNoDzk/dk6xvV5yaMwNeJm5OKihHo29/9QYRTUIQmnKiVNd1Yu2lRGpGOWSlXqIgJnRMhtA1VJAQlJfObs/wsVEGOIikKaHxTP09kZJQqUnom86Q6JFa9Kbif1430cGllzIRJxoEnS8KEo51hKdB4AGTQDWfGEKoZOZWTEdEEqpNXCUTgrv48jJpndVcp+benlfqV3kcRXSIjlAVuegC1dENaqAmougRPaNX9GZl1ov1bn3MWwtWPnOA/sD6/AHtlpOz</latexit> Figure 1. The video self-attention blocks that we investigate in this work. Each attention layer implements self-attention (Vaswani et al., 2017b) on a specified spatiotemporal neighborhood of frame-level patches (see Figure 2 for a visualization of the neighborhoods). We use residual connections to aggregate information from different attention layers within each block. A 1-hidden-layer MLP is applied at the end of each block. The final model is constructed by repeatedly stacking these blocks on top of each other. the N patches span the entire frame, i.e., N = HW/P2 .
  • 21. Object-ABN 動作認識のための シャープなアテンショ ンマップ生成 仁田智也 (名古屋工業大学) 平川翼 (中部大学) 藤吉弘亘 (中部大学) 玉木徹 (名古屋工業大学) SSII2022/IEICE-ED2023
  • 22. 研究動機 n動画認識 • 動画の中の行動や動作を識別する技術 • 画像と時間情報を同時に取り扱う nアテンション機構 • 入力データの注目するべき所を決める • Transformer (Vaswani+, NIPS2017) • 動画認識にも利用されている • GTA (He+, BMVC2021) • Non-Local Neural Network (Wang+, CVPR2018) • アテンションマップが説明可能AIなどに使われる • ABN (Fukui+, CVPR2019) 飛び込み している リフティングして いる 入力動画 アテンションマップ
  • 23. Object-ABN Extractor classifie r Attention branch Instance segmentation OR 各オブジェクトのmask Attention map 全オブジェクトのmask <latexit sha1_base64="9dzSeOm/bdyVmyql0cR3dYUx3BY=">AAAChXichVFNLwNBGH66Pkp9FReJy0ZDetFMpUFcSFycREtVQjW7azCxX9mdNqFxdfAHHJxIRIQrf8DFH3DwE8SRxMXB2+02QoN3MjPPPPM+7zwzo7um8CVjTxGlpbWtPdrRGevq7unti/cPrPpO2TN43nBMx1vTNZ+bwuZ5KaTJ11yPa5Zu8oK+N1/bL1S45wvHXpH7Li9a2o4ttoWhSaJKcTW3WV1UN6SwuK+uNMBCAxQOS/EES7Eg1GaQDkECYSw58UtsYAsODJRhgcOGJGxCg09tHWkwuMQVUSXOIySCfY5DxEhbpixOGRqxezTu0Go9ZG1a12r6gdqgU0zqHilVjLJHdsVe2QO7Zs/s49da1aBGzcs+zXpdy91S3/HQ8vu/Kotmid0v1Z+eJbYxHXgV5N0NmNotjLq+cnDyujyTG62OsXP2Qv7P2BO7pxvYlTfjIstzp4jRB6R/PnczWJ1IpSdTE9lMYi4ZfkUHhjGCJL33FOawgCXk6dwj3OAWd0pUGVcyymQ9VYmEmkF8C2X2E+AxlPg=</latexit> RN⇥T ⇥H⇥W <latexit sha1_base64="+GfDlrSGhUtVozt6p1qErvtAYvQ=">AAAChXichVG7SgNBFD2u7/hI1EawWQyKjWFWJIqNgo1lEk0i+Ai766iD+2J3EojB1sIfsLBSEBFt9Qds/AGLfIJYKthYeLNZERX1DjNz5sw9d87MGJ4lAslYvUVpbWvv6OzqjvX09vXHEwODhcAt+ybPm67l+quGHnBLODwvhbT4qudz3TYsXjT2Fhv7xQr3A+E6K7Lq8Q1b33HEtjB1SVQpoeY2a5q6LoXNA3XlAyx9gOJBKZFkKRaG+hNoEUgiioybuMA6tuDCRBk2OBxIwhZ0BNTWoIHBI24DNeJ8QiLc5zhAjLRlyuKUoRO7R+MOrdYi1qF1o2YQqk06xaLuk1LFGHtgl+yZ3bMr9sjefq1VC2s0vFRpNppa7pXiR8PLr/+qbJoldj9Vf3qW2MZs6FWQdy9kGrcwm/rK/vHz8lxurDbOztgT+T9ldXZHN3AqL+Z5ludOEKMP0L4/909QmEpp6dRUdjq5MBF9RRdGMIoJeu8ZLGAJGeTp3ENc4wa3SqcyqUwr6Waq0hJphvAllPl3owWU2w==</latexit> R1⇥T ⇥H⇥W <latexit sha1_base64="+GfDlrSGhUtVozt6p1qErvtAYvQ=">AAAChXichVG7SgNBFD2u7/hI1EawWQyKjWFWJIqNgo1lEk0i+Ai766iD+2J3EojB1sIfsLBSEBFt9Qds/AGLfIJYKthYeLNZERX1DjNz5sw9d87MGJ4lAslYvUVpbWvv6OzqjvX09vXHEwODhcAt+ybPm67l+quGHnBLODwvhbT4qudz3TYsXjT2Fhv7xQr3A+E6K7Lq8Q1b33HEtjB1SVQpoeY2a5q6LoXNA3XlAyx9gOJBKZFkKRaG+hNoEUgiioybuMA6tuDCRBk2OBxIwhZ0BNTWoIHBI24DNeJ8QiLc5zhAjLRlyuKUoRO7R+MOrdYi1qF1o2YQqk06xaLuk1LFGHtgl+yZ3bMr9sjefq1VC2s0vFRpNppa7pXiR8PLr/+qbJoldj9Vf3qW2MZs6FWQdy9kGrcwm/rK/vHz8lxurDbOztgT+T9ldXZHN3AqL+Z5ludOEKMP0L4/909QmEpp6dRUdjq5MBF9RRdGMIoJeu8ZLGAJGeTp3ENc4wa3SqcyqUwr6Waq0hJphvAllPl3owWU2w==</latexit> R1⇥T ⇥H⇥W
  • 24. 4. 実験 17 表 1: UCF101 の検証セットに対する性能評価.MHA はマルチヘッドアテンションを 表す.Lper/attn のみの場合は元の ABN に相当する.エントロピーは,object がマス ク損失を取るチャンネル M[:, 1, :, :],inverse が背景に対応するチャンネル M[:, 2, :, :], 記載なしがマスク損失を計算しないチャンネル M[:, 0 :, :, ] のものである. entropy entropy entropy Lper/attn Lmask MHA LPC top-1 top-5 object inverse ! 93.96 99.15 3.064 ! ! 93.62 99.26 2.026 ! ! 94.68 99.47 3.041 ! ! ! 88.93 98.04 2.850 1.360 1.356 ! ! ! ! 87.76 97.27 2.815 1.388 1.414 アテンションマップ比較 nABN • まだらなアテンションマップ nABN + インスタンスセグメン テーション • シャープなアテンションマップ 入力動画
  • 26. 軽量な画像認識のためのシフト n 畳み込み • 各層のフィルタリングにより特徴量を混ぜる • フィルタリングの重みが学習パラメータ n シフト • 単なるシフト操作で特徴量を混ぜる • 実際に混ぜるのはシフト後の1x1conv • 学習するパラメータなし • どの程度シフトするかはハイパラ n 手法 • CNN • Shift [Wu+, CVPR2018] • ShiftResNet [Chen+, CVPR2019] • AddresNet [He+, WACV2019] • ViT • ShiftViT [Wang+, AAAI2022] [Chen+, CVPR2019] By Aphex34- Own work, CC BY-SA 4.0
  • 27. TSM nTemporal Shift Module n特徴量シフトを動画像に導入 • frame-wise 2D & Late fusion • 2D ResNet-50をフレーム毎に適用 • 時間方向に平均集約 • 時間方向のモデル化 • 中間特徴量を前後の時刻t+1/t-1で 入れ替え(シフト) • 固定カーネルの時間方向1D畳み 込みともみなせる [Lin+, ICCV2019] layer layer layer layer layer layer layer layer layer late fusion <latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit> t <latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit> t + 1 <latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit> t 1 layer shift layer shift layer shift layer shift layer shift layer shift layer shift layer shift layer shift late fusion <latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit> t <latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit> t + 1 <latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit> t 1 TSMなし TSMあり
  • 28. Temporal Cross-attention for Action Recognition Ryota Hashiguchi, Toru Tamaki SSII2022/ACCVW2022
  • 29. Feature shift for CNN and ViT nTemporal Shift Module (TSM) [Lin+, ICCV2019] • 2D CNN layer features are temporally shifted nToken Shift Transformer (TokenShift) [Zhang+, ACMMM2021] • Shifting class tokens only • Not fully exploiting spatio-temporal features nProposed method: MSCA • A new shift method based on the ViT structure • Multi-head Self+Cross Attention (MSCA) [Lin+, ICCV2019] [Zhang+, ACMMM2021]
  • 30. ViT, TokenShift, and MSCA ViT TokenShift Ours Embedded shift Norm MSA Norm MLP shift + + L x Embedded Norm MSCA Norm MLP + + L x Embedded Norm MSA Norm MLP + + L x
  • 31. Multi-head Self+Cross Attention (MSCA) nPartially insert Cross Attention with the time t-1 and t+1 V V V Q K Q K Q K <latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit> t <latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit> t + 1 <latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit> t 1 Embedde d Norm MHA Norm MLP + + L x
  • 32. MSCA-KV V K Q shift nshift K, V in head direction nHeads=4 X X attention <latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit> t <latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit> t + 1 <latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit> t 1 V K Q V K Q V V V Q K Q K Q K
  • 33. Effect of the amount of head shifts nMSCA-KV nBest when only two heads shifted nToo much shift hurts the performance 18 4. 実験 表 1: Kinetics400 の検証セットに対する MSCA-KV の性能への,ヘッド ト量の影響.シフト量 0 は ViT に対応する.ヘッド数 h = 12 であるため 1/12 は 1 つのヘッドをシフトすることを意味し,1/6 はヘッド 2 つを意 shift heads top-1 top-5 0 (ViT) 0 75.65 92.19 1/12 1 76.47 92.88 1/6 2 76.07 92.61 1/4 3 75.66 92.30 1/3 4 74.72 91.91 素の部分をクロップする.さらに,それぞれ 10%の割合で,画像の明る MSCA-KV ! " # $%&'( X X )((*+(&,+ <latexit sha1_base64="stt9By8uBe0uqJWSZ83PJjuGj1Q=">AAACZXichVHLSsNAFD2Nr1pf9YEILhSL4qrciKi4KrpxWatVQUWSONVgmoRkWqjFHxC36sKVgoj4GW78ARf9AhGXFdy48CYNiIp6h5k5c+aeO2dmdNcyfUlUiylNzS2tbfH2REdnV3dPsrdvzXdKniHyhmM53oau+cIybZGXprTEhusJrahbYl0/WAz218vC803HXpUVV2wXtT3bLJiGJpnKycROMkVpCmP0J1AjkEIUWSd5gy3swoGBEooQsCEZW9Dgc9uECoLL3DaqzHmMzHBf4AgJ1pY4S3CGxuwBj3u82oxYm9dBTT9UG3yKxd1j5SjG6ZFuqU4PdEfP9P5rrWpYI/BS4VlvaIW703M8tPL2r6rIs8T+p+pPzxIFzIVeTfbuhkxwC6OhLx+e11fmc+PVCbqiF/Z/STW65xvY5VfjelnkLhB8gPr9uX+Ctam0OpOeXp5OZRair4hjGGOY5PeeRQZLyCLP5xZwglOcxZ6ULmVAGWykKrFI048voYx8ACsdihE=</latexit> t <latexit sha1_base64="fnOYX2ucAqbDkgRk1KZKc6TFWVw=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWiiCUGxEVV0U3Ln1VC1VKEscaTJOQTAta/AEXbhVcKYiIn+HGH3DhJ4jLCm5ceJMGREW9w8ycOXPPnTMzumuZviR6iilt7R2dXfHuRE9vX38yNTC46Ts1zxAFw7Ecr6hrvrBMWxSkKS1RdD2hVXVLbOkHS8H+Vl14vunYG/LQFTtVrWKbe6ahyZCaUhPlVIZyFEb6J1AjkEEUK07qBtvYhQMDNVQhYEMytqDB51aCCoLL3A4azHmMzHBf4BgJ1tY4S3CGxuwBjxVelSLW5nVQ0w/VBp9icfdYmUaWHumWmvRAd/RM77/WaoQ1Ai+HPOstrXDLyZOR9bd/VVWeJfY/VX96ltjDfOjVZO9uyAS3MFr6+tFZc31hLduYoCt6Yf+X9ET3fAO7/mpcr4q1CwQfoH5/7p9gczqnzuZmVmcy+cXoK+IYxTgm+b3nkMcyVlDgc/dxijOcx56VpDKsjLRSlVikGcKXUMY+AC9hioE=</latexit> t + 1 <latexit sha1_base64="wrGjWsuTUr2IwgiStFcGL0bqh08=">AAACZ3ichVHLSsNAFD2Nr1ofrQoiuKmWihvLjYiKq6Ibl76qhSoliWMNpklIpgUt/oALtwquFETEz3DjD7jwE8RlBTcuvEkDoqLeYWbOnLnnzpkZ3bVMXxI9xZS29o7Ornh3oqe3rz+ZGhjc9J2aZ4iC4ViOV9Q1X1imLQrSlJYoup7QqroltvSDpWB/qy4833TsDXnoip2qVrHNPdPQZEhNqYlyKkM5CiP9E6gRyCCKFSd1g23swoGBGqoQsCEZW9DgcytBBcFlbgcN5jxGZrgvcIwEa2ucJThDY/aAxwqvShFr8zqo6Ydqg0+xuHusTCNLj3RLTXqgO3qm919rNcIagZdDnvWWVrjl5MnI+tu/qirPEvufqj89S+xhPvRqsnc3ZIJbGC19/eisub6wlm1M0BW9sP9LeqJ7voFdfzWuV8XaBYIPUL8/90+wOZ1TZ3MzqzOZ/GL0FXGMYhyT/N5zyGMZKyjwufs4xRnOY89KUhlWRlqpSizSDOFLKGMfM2WKgw==</latexit> t 1 ! " # ! " #
  • 34. Number of encoder blocks with MSCA nReplaced MSA modules with MSCA • 4, 8, 12 modules near the top of the network • Replacing 12 modules is MSCA-KV nUsing 8 or more MSCA modules • Input video clip consists of 8 frame • Shifting more than 8 times covers the temporal information from all 8 frames of the input video clip 表 3: MSA を MCSA に置き換える個数が性能に与える影響 12 は MSCA-KV,0 は ViT に相当する. # MSCA top-1 top-5 0 (ViT) 75.65 92.19 4 75.67 92.19 8 76.40 92.77 12 76.47 92.88
  • 36. データ拡張 n画像認識用 • 回転,水平垂直反転,ノイズ • Mix系の手法 • CutOut [DeVries&Taylor, arXiv2017] • Mixup [Zhang+, ICLR2018] • CutMix [Yun+, ICCV2019] • CG合成 • FlyingThings3D [Mayer+, CVPR2016] • オプティカルフロー推定用 • SURREAL [Varol+, CVPR2017] • 人物姿勢推定用 n動画像 • 水平反転はしない場合がある Cutout Mixup CutMix we rendered all non-RGB data without antialiasing. Given the intrinsic camera parameters (focal length, principal point) and the render settings (image size, virtual sensor size and format), we project the 3D motion vector of each pixel into a 2D pixel motion vector coplanar to the imaging plane: the optical flow. Depth is directly retrieved from a pixel’s 3D position and converted to disparity using the known configuration of the virtual stereo rig. We com- FlyingThings3D albumentations [Buslaev+, Information, 2020] imgaug [Jung+, 2020] SURREAL
  • 37. VideoMix nCutMix [Yun+, ICCV2019]の応用 • 3種類を提案 • S-VideoMix • すべての時刻に同じ矩形部分を貼り付け • T-VideoMix • ある時刻区間に全画面を貼り付け • ST-VideoMix • 時空間ボリュームを貼り付け [Yun+, arXiv2020] top1 top5 75.2 91.7 77.0 93.1 5] 75.6 92.2 S-VideoMix T-VideoMix ST-VideoMix Video B Video A time height
  • 38. ObjectMix Data Augmentation by Copy-Pasting Objects in Videos for Action Recognition Jun Kimata, Tomoya Nitta, Toru Tamaki SSII2022/MMAsia2022
  • 39. Data augmentation for action recognition nInspired by: Copy-Paste [Ghiasi+, CVPR2021] • Method for segmentation • Cut out only instance and pastes it onto the other nProposed method: ObjectMix • Cut the object and pastes it onto another video • Extract object from each frames • Mix videos while considering object Figure 2. We use a simple copy and paste method to create new images for training instance se jittering on two random training images and then randomly select a subset of instances from o The key idea behind the Copy-Paste augmentation is to paste objects from one image to another image. This can lead to a combinatorial number of new training data, with multiple possibilities for: (1) choices of the pair of source image from which instances are copied, and the target im- age on which they are pasted; (2) choices of object instances to copy from the source image; (3) choices of where to paste the copied instances on the target image. The large variety of options when utilizing this data augmentation method al- lows for lots of exploration on how to use the technique most effectively. Prior work [12, 15] adopts methods for de- ciding where to paste the additional objects by modeling the surrounding visual context. In contrast, we find that a sim- ple strategy of randomly picking objects and pasting them at random locations on the target image provides a significant mentation [43] (48. passes state-of-the- EfficientDet-D7x-1 P7-1536 [61] (55.8 size of 1280 instead Finally, we show sults in better featu typically used in th Paste we get improv rare and common ca The Copy-Paste into any instance s labeled images effe inference overhead Mask-RCNN show Source images Figure 2. We use a simple copy and paste method to create new images for train jittering on two random training images and then randomly select a subset of ins The key idea behind the Copy-Paste augmentation is to paste objects from one image to another image. This can mentat passes Figure 2. We use a simple copy and paste method to create new images for training instance segm jittering on two random training images and then randomly select a subset of instances from one The key idea behind the Copy-Paste augmentation is to paste objects from one image to another image. This can lead to a combinatorial number of new training data, with multiple possibilities for: (1) choices of the pair of source image from which instances are copied, and the target im- age on which they are pasted; (2) choices of object instances to copy from the source image; (3) choices of where to paste the copied instances on the target image. The large variety of options when utilizing this data augmentation method al- lows for lots of exploration on how to use the technique most effectively. Prior work [12, 15] adopts methods for de- ciding where to paste the additional objects by modeling the surrounding visual context. In contrast, we find that a sim- ple strategy of randomly picking objects and pasting them at random locations on the target image provides a significant boost on top of baselines across multiple settings. Specif- ically, it gives solid improvements across a wide range of mentation [43] (48.5 m passes state-of-the-art EfficientDet-D7x-1536 P7-1536 [61] (55.8 bo size of 1280 instead of Finally, we show th sults in better features typically used in the L Paste we get improvem rare and common cate The Copy-Paste au into any instance seg labeled images effecti inference overheads. Mask-RCNN show th training, and without ily improved, e.g., by + Generated images Source videos Generated videos
  • 40. Proposed method: ObjectMix nAlgorithm 1. Preparing two source videos 2. Extracting object regions from each video (Detectron2 [Wu+, 2019]) 3. Cut out the object using the extracted masks 4. The cut object is pasted to the other side Videos Masks Objects Generated
  • 41. Experiment 1: ObjectMix nHighest performance at about p=0.6 nToo large p reduces performance. np=0 is higher in the early stages, but becomes equal or higher