セル生産方式におけるロボットの活用には様々な問題があるが,その一つとして 3 体以上の物体の組み立てが挙げられる.一般に,複数物体を同時に組み立てる際は,対象の部品をそれぞれロボットアームまたは治具でそれぞれ独立に保持することで組み立てを遂行すると考えられる.ただし,この方法ではロボットアームや治具を部品数と同じ数だけ必要とし,部品数が多いほどコスト面や設置スペースの関係で無駄が多くなる.この課題に対して音𣷓らは組み立て対象物に働く接触力等の解析により,治具等で固定されていない対象物が組み立て作業中に運動しにくい状態となる条件を求めた.すなわち,環境中の非把持対象物のロバスト性を考慮して,組み立て作業条件を検討している.本研究ではこの方策に基づいて,複数物体の組み立て作業を単腕マニピュレータで実行することを目的とする.このとき,対象物のロバスト性を考慮することで,仮組状態の複数物体を同時に扱う手法を提案する.作業対象としてパイプジョイントの組み立てを挙げ,簡易な道具を用いることで単腕マニピュレータで複数物体を同時に把持できることを示す.さらに,作業成功率の向上のために RGB-D カメラを用いた物体の位置検出に基づくロボット制御及び動作計画を実装する.
This paper discusses assembly operations using a single manipulator and a parallel gripper to simultaneously
grasp multiple objects and hold the group of temporarily assembled objects. Multiple robots and jigs generally operate
assembly tasks by constraining the target objects mechanically or geometrically to prevent them from moving. It is
necessary to analyze the physical interaction between the objects for such constraints to achieve the tasks with a single
gripper. In this paper, we focus on assembling pipe joints as an example and discuss constraining the motion of the
objects. Our demonstration shows that a simple tool can facilitate holding multiple objects with a single gripper.
【DLゼミ】XFeat: Accelerated Features for Lightweight Image Matchingharmonylab
公開URL:https://arxiv.org/pdf/2404.19174
出典:Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. ascimento: XFeat: Accelerated Features for Lightweight Image Matching, Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
概要:リソース効率に優れた特徴点マッチングのための軽量なアーキテクチャ「XFeat(Accelerated Features)」を提案します。手法は、局所的な特徴点の検出、抽出、マッチングのための畳み込みニューラルネットワークの基本的な設計を再検討します。特に、リソースが限られたデバイス向けに迅速かつ堅牢なアルゴリズムが必要とされるため、解像度を可能な限り高く保ちながら、ネットワークのチャネル数を制限します。さらに、スパース下でのマッチングを選択できる設計となっており、ナビゲーションやARなどのアプリケーションに適しています。XFeatは、高速かつ同等以上の精度を実現し、一般的なラップトップのCPU上でリアルタイムで動作します。
79. Hard negative and positive mining
マッチングが困難なペアに対するCNNの更新
順伝播によりhard negative, positiveのペアを求める
hard negativeとhard positiveに対して再度逆伝播
Simo-serra, et al., “Discriminative Learning of Deep Convolutional Feature Point Descriptors”,
ICCV, 2015. ポスターより、図の一部を引用
• Consistent improvements over the state of the art.
• Trained in one dataset, but generalizes very well to scaling, rota-
tion, deformation and illumination changes.
• Computational efficiency (on GPU: 0.76 ms; dense SIFT: 0.14 ms).
Code is available: https:/ / github.com/ etrulls/ deepdesc-release
Key observation
1. We train a Siamese architecture with pairs of patches. We want
to bringmatching pairstogether and otherwise pull them apart.
2. Problem? Randomly sampled pairs are already easy to separate.
3. Solution: To train discriminative networks we use hard negative
and positive mining. This proves essential for performance.
(a) 12 points/ 132 patches with t-SNE [8]
(b) All pairs: pos/ neg
(c) “ Hard” pairs: pos/ neg
We take samples from [1], for illustration. Cor-
responding patchesareshown with samecolor:
(a) Representation from t-SNE [8]. Distance
encodessimilarity.
(b) Random sampling: similar (close) posi-
tivesand different (distant) negatives.
(c) We mine the samples to obtain dissimilar
positives (+, long blue segments) and sim-
ilar negatives (× , short red segments):
(d) Random sampling results in easy pairs.
(e) Mined pairs with harder correspondences.
(d) Random pairs
(e) Mined pairs
This allows us to train discriminative models with a small number
of parameters (∼45k), which also alleviates overfitting concerns.
Stride 2 3 4
Train on the M VS Dataset [1]. 64 × 64 grayscale pat
Statue of Liberty (LY, top), NotreDame (ND, center)
bottom). ∼150k points and ∼450k patches each ⇒ 10
and 1012
negative pairs⇒ Efficient exploration wit
Weminimizethehingeembedding loss. With 3D po
l(x1, x2)=
∥D(x1) − D(x2)∥2,
max(0, C − ∥D(x1) − D(x2)∥2),
This penalizes corresponding pairs that are placed
non-corresponding pairs that are less than C units a
M ethodology: Train over two setsand test over third
with cross-validation. Metric: precision-recall (PR
haystack’ setting: pick 10k unique points and gen
tive pair and 1k negative pairs for each, i.e. 10k po
negatives. Results summarized by ‘Area Under the
Effect of mining
(a) Forward-propagate positives sp ≥ 128 and negat
(b) Pick the 128 with the largest loss (for each) and b
Recall
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
PR curve, training LY+YO, test ND
SIFT
CNN3, mined 1/2
CNN3, mined 2/2
CNN3, mined 4/4
CNN3, mined 8/8
Recall
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
PR curve, training LY+ND, test YO
SIFT
CNN3, mined 1/2
CNN3, mined 2/2
CNN3, mined 4/4
CNN3, mined 8/8
0 0.1 0.2 0.3
Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
PR curve, tr
sp sn PR AUC
128 128 0.366
256 256 0.374
512 512 0.369
1024 1024 0.325
Table 1: (a) No mining.
Larger batches do not help.
sp sn rp rn Cos
128 256 1 2 20%
256 256 2 2 35%
512 512 4 4 48%
1024 1024 8 8 67%
Table 2: (b) Mining with rp =
The mining cost is incurred d
available: https:/ / github.com/ etrulls/ deepdesc-release
bservation
ain a Siamese architecture with pairs of patches. We want
ngmatching pairstogether and otherwisepull them apart.
em? Randomly sampled pairs are already easy to separate.
ion: To train discriminative networks we use hard negative
positive mining. This proves essential for performance.
points/ 132 patches with t-SNE [8]
(b) All pairs: pos/ neg
(c) “ Hard” pairs: pos/ neg
samples from [1], for illustration. Cor-
ing patchesareshown with samecolor:
bottom). ∼150k points
and 1012
negative pa
Weminimizethehing
l(x1, x2)=
m
This penalizes corres
non-corresponding p
M ethodology: Train o
with cross-validation
haystack’ setting: pic
tive pair and 1k nega
negatives. Results sum
Effect of mining
(a) Forward-propaga
(b) Pick the 128 with
(a) 12 points/ 132 patches with t-SNE [8]
(b) All pairs: pos/ neg
(c) “ Hard” pairs: pos/ neg
We take samples from [1], for illustration. Cor-
responding patchesareshown with samecolor:
(a) Representation from t-SNE [8]. Distance
encodessimilarity.
(b) Random sampling: similar (close) posi-
tivesand different (distant) negatives.
(c) We mine the samples to obtain dissimilar
(d) Random pairs
各パッチ画像の特徴量の距離空間
同じ色のパッチ
↓
対応点
赤:negativeペア
青:positiveペア
hard positive
hard negative
hard negativeとhard positiveに適したネットワークに更新
92. ICCV2015での関連論文
Local Convolutional Features with Unsupervised
Training for Image Retrieval
2層のConvolutional Kernel Networkによる特徴量記述
入力パッチ画像にホワイトニングや勾配抽出等の
前処理を加えることで性能向上
Aggregating Deep Convolutional Features for
Image Retrieval
畳み込み層で出力された各2次元特徴マップの線形和を
特徴量として表現
特徴量にPCAを適用してコンパクト化,過剰適合の防止
RIDE: Reversal Invariant Descriptor
Enhancement
画像反転に不変なSIFT特徴量を提案
Bag-of-Featuresの問題に対して性能向上
CNNをベースとした手法
101. 関連文献
— [SIFT]
David G. Lowe, "Distinctive image features from scale-Invariant
keypoints," IJCV2004
— [SURF]
Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool, “Speeded-
up robust features (SURF),” CVIU2008.
— [PCA-SIFT]
Yan Ke and Rahul Sukthankar, “PCA-SIFT: A more distinctive
representation for local image descriptors,” CVPR2004
— [GLOH]
Krystian Mikolajczyk and Cordelia Schmid, “A performance evaluation of
local descriptors,“ TPAMI2005
— [RIFF]
Gabriel Takacs, Vijay Chandrasekhar, Sam Tsai, David Chen, Radek
Grzeszczuk, and Bernd Girod, "Unified real-time tracking and recognition
with rotation-invariant fast features," CVPR2010
— [ASIFT]
Jean-Michel Morel and Guoshen Yu, "ASIFT: A new framework for fully
affine invariant image comparison" SIAM Journal on Imaging Sciences,
Vol. 2, No. 2, pp. 438-469, April 2009.
本資料では、下記に記載の論文から図を一部引用した。
102. 関連文献
— [BRIEF]
M.Calonder, V.Lepetit, and P.Fua, “BRIEF: Binary Robust Independent
Elementary Features,” ECCV2010
— [BRISK]
S.Leutenegger, M.Chli and R.Siegwart, "BRISK: Binary Robust Invariant
Scalable Keypoints,“ ICCV2011
— [ORB]
E.Rublee, V.Rabaud, K.Konolige and G.Bradsk, "ORB: an efficient
alternative to SIFT or SURF,“ ICCV2011
— [FREAK]
Alexandre Alahi, Raphael Ortiz, and Pierre Vandergheynst, “FREAK: Fast
retina keypoint," CVPR2012
— [D-BRIEF]
Tomasz Trzcinski and Vincent Lepetit, "Efficient discriminative projections
for compact binary descriptors," ECCV2012
— [BinBoost]
T. Trzcinski, M. Christoudias, V. Lepetit, and P. Fua, "Boosting binary
keypoint descriptors," CVPR2013
— [BOLD]
V. Balntas, L. Tang, and K. Mikolajczyk. BOLD - binary online learned
descriptor for efficient image matching. CVPR2015
本資料では、下記に記載の論文から図を一部引用した。
103. 関連文献
— [CARD]
M.Ambai and Y.Yoshida, “Compact And Real-time Descriptors, ” in Proc.
ICCV2011
— [LDA Hash]
C. Strecha, A.M. Bronstein, M.M. Bronstein, and Pee. Fua, "LDAHash:
Improved matching with smaller descriptors," TPAMI2012
— [CNN Descriptor]
E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-
Noguer, "Discriminative learning of deep convolutional feature point
descriptors," ICCV2015
— [CNN Similarity]
S. Zagoruyko and N. Komodakis. Learning to compare image patches via
convolutional neural networks," CVPR2015
— [Spectral Affine SIFT]
Takahiro Hasegawa, Mitsuru Ambai, Kohta Ishikawa, Gou Koutaki, Yuji
Yamauchi, Takayoshi Yamashita, Hironobu Fujiyoshi, "Multiple-Hypothesis
Affine Region Estimation With Anisotropic LoG Filters," ICCV2015
本資料では、下記に記載の論文から図を一部引用した。