Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- 論文紹介 Semi-supervised Learning with ... by Seiya Tokui 131169 views
- 半教師あり学習 by syou6162 21197 views
- NIPS2015読み会: Ladder Networks by Eiichi Matsumoto 26265 views
- IIBMP2016 深層生成モデルによる表現学習 by Preferred Network... 29489 views
- 夏のトップカンファレンス論文読み会 (9/18, 2017): Sem... by tomoya-sakai 871 views
- 機械学習とコンピュータビジョン入門 by Kinki University 5622 views

13,358 views

Published on

No Downloads

Total views

13,358

On SlideShare

0

From Embeds

0

Number of Embeds

9,755

Shares

0

Downloads

48

Comments

0

Likes

7

No embeds

No notes for slide

- 1. One-‐shot learning by inver2ng a composi2onal causal process Brenden M.Lake Ruslan Salakhutdinov Joshua B. Tenenbaum 能地宏 @NII ※スライド中の図は論文からの引用です
- 2. people can classify new images of a foreign handwritte [23, 16, 17]. Similarly, while classiﬁers are generally t 16, 17]. Similarly, while classiﬁers are generally [23, g benchmark One-‐shot classiﬁca2on[4]and CIFAR benchmark datasets such as ImageNet [4] and CIFA datasets such as ImageNet b) b) an you learn a new concept from just one example? (a & b) n in learn a new concept from just one example? (a & b wnyoured? Answers for b) are row 4 column 3 (left) and row wn in red? Answers for b) are row 4 column 3 (left) and ro
- 3. ample mple Example People HBP One-‐shot Genera2on People People HBPL HBPL Afﬁ Afﬁne
- 4. One-‐shot Genera2on Visual Turing test. To compare
- 5. ample mple Example People HBP One-‐shot Genera2on People People People HBPL HBPLmodel The Afﬁ Afﬁne
- 6. One-‐shot Genera2on The model People Visual Turing test. To compare
- 7. Overview ‣ 人間はたった一つの例から、そのシンボルの特徴を取り出せる -‐ 分類：似たものを取り出せる -‐ 生成：新しいサンプルを作り出せる ‣ 機械学習は典型的に、ラベル毎に大量のデータを必要とする -‐ ex) MNIST: 6000 training data / class ‣ タスクと貢献 -‐ 機械学習はこの人間の能力を模倣できるか？ -‐ 丁寧に生成モデルを定義したら、人間と同じような結果が得られた -‐ 人間も同じような仕組みで特徴を抽出していると言えるかも
- 8. ing algorithms typically require hundreds or thousand typically require hundreds or thousan same problems. Here we present a Hierarchical Bay Here we present a Hierarchical Bay positionality and causality that can learn aawide rang causality that can learn wide ran ple) visual concepts, generalizing in human-like way generalizing in human-like wa evaluated performance on a challenging one-shot c evaluated performance on a challenging one-shot cla model model achieved a human-level error rate while subst human-level error rate while subs learn h We also tested the model on a deep deep learning models.yper parameters: on an models. We also tested the model erating examples, by using by strokes 30 aa) library oferating new examples, number ofusing aa “visual Turing tes lphabets primitives human-like performance. “visual Turing te b) motor produces produces performance. Figure 4: Lear データと学習 frequency Omniglot dataset Number of strokes 6000 1 1 2 2 parameters. a) 2000 primitives, whe row shows the 0 0 2 4 6 8 mon ones. Th c) stroke start positions trol point (circle b&c) Empirica People can acquire a new concept from only the barestof e People can acquire a new concept from only the barestwhere ex of th tions examples in a high-dimensional space of raw perceptualinp c) show sta examples in a high-dimensional space of raw perceptualhowinp 1 2 ≥4 differs by stroke tackled some of the same classiﬁcation 3and recognition proble 1 Introduction 1 Introduction 1 4000 1 2 3 2 3 4 4 tackled some of the same classiﬁcation and recognition probl Image. standard transformation Arequire4 hundredsfrom P (A(m) ) = Nof ex An image algorithms (m) 2 R is sampled the standard algorithms require hundreds or thousands ([1, 1, the where the ﬁrst two elements control a global re-scaling and the or thousands of e While centerstandard T (m) . The transformed trajectoriessecond two control a globa tion of the the of mass of MNIST benchmark dataset then be rendered as can for digit recog While the standard MNIST benchmark dataset for digit recogn class image, can classify new [10] (see rom SI-2). handw grayscale[19], people can classify new images of foreignhandwri 20 alphabetsusing an inklearn posterior fof aaforeign This graysc class [19], peoplenoisemodel adapted fromimages Sectionmore robust during is (Figure 1b)by two16, 17]. Similarly,make the classiﬁers are general then perturbed [23, processes, which while gradient (Figure 1b) partial solutions Similarly,xample: tion and encourage[23, 16, 17]. during classiﬁcation. These processes include peo Figure 2: Four alphabets from Omniglot, each with ﬁve only one such asby four differentconvol characters e while classiﬁers are genera class, using benchmark datasets drawnas ImageNet [4] and✏(m) , using benchmark datasets and pixel ﬂipping with probability CIF (m) a class, ﬁlter with standard deviation b Gaussian such ImageNet [4] and CI 3 3 4 4 50 alphabets; a) amount noise “Segway” b) Figure on a These range larger drawn new visual object from just one examplea) ofpixels then parameterize 105x105uniformly1a).pre-speciﬁednew (Section S (e.g., a and ✏ areb) independent Bernoulli distributions, completi in grayscale 1600 characters; along with larger and “deeper” model |✓ ) = P (I |T , A , while). perform have developed architectures, and , ✏ model of binary images P (I steadily (and even spectacularly [15]) improved in this big data setting, it is unknown 20 examples / character 2.3 Learning high-level knowledge of motor programs (m) b (m) (m) (m) (m) (m) (m) (m) b (m) progress translates to the “one-shot” setting that is a hallmark of human learning [3, 22, 28 The Omniglot dataset was randomly split into a 30 alphabet “background” set and a 20 “evaluation” set, constrained such that the background set included the six most common as determined by Google hits. Background images, paired with their motor data, were use the hyperparameters of the HBPL model, including a set of 1000 primitive motor elemen 4a) and position models for a drawing’s ﬁrst, second, and third stroke, etc. (Figure 4c). possible, cross-validation (within the background set) was used to decide issues of model c within the conditional probability distributions of HBPL. Details are provided in Sectio learning the1: Can of primitives, a new concept from just one example? transfo Figure models you learn positions, relations, token variability, and image (a & Additionally, while classiﬁcation has received most of the attention in machine learning can generalize in a variety of other ways after learning a new concept. Equipped with the “Segway” or a new handwritten character (Figure 1c), people can produce new examples, object into its critical parts, and ﬁll in a missing part of an image. While this ﬂexibility highl Figure 1:shown much Answers for are row 4 column features& richness of people’s concepts, suggesting they Can youred? more thanb)discriminative 3 (left) and concept are in learn a new concept from just one example? (a 2.4 Inference
- 9. 先に結果を紹介 One-‐shot classiﬁca2on (Error rate) 34.8 38 18.2 4.5 human 4.8 HBPL aﬃne DBM HD ‣ Deep learning よりも良い性能 ‣ ほぼ人間と同じエラー率 One-‐shot genera2on Visual Turing test: 9個の同じシンボルを見て どちらが人間かを当ててもらう 56％ で正解
- 10. モデル type level primitives R1 } y11 x11 (m) R1 x(m) 11 } (m) L1 (m) y11 (m) R2 x12 y12 17 = along s11 42 x21 y21 (m) (m) R2 (m) x12 x21 (m) (m) y12 (m) y21 L2 T2 (m) 2 character type 2 ( = 2) 157 z11 = 5 z21 = 42 (m) T1 {A, ✏, 5 z12 = 17 z11 = 17 = independent token level ✓(m) ... character type 1 ( = 2) R1 I (m) x11 y11 = independent (m) R2 = start of s11 (m) (m) x11(m) R2 y11 (m) L2 R1 (m) L1 x21 y21 (m) x21 (m) y21 (m) (m) T1 {A, ✏, b} z21 = 17 T2 (m) b} I (m) Figure 3: An illustration of the HBPL model generating two character types (left and right), where the dotted line separates the type-level from the token-level variables. Legend: number of strokes , relations R, primitive id z (color-coded to highlight sharing), control points x (open circles), scale y, start locations L, trajectories T , transformation A, noise ✏ and ✓b , and image I.
- 11. ハイパーパラメータの学習 a) b) library of motor primitives number of of strokes Number strokes frequency 6000 1 1 2 2 4000 2000 0 c) 0 2 4 6 8 stroke start positions 1 1 2 1 2 3 3 4 2 3 3 3 4 4 ≥4 4 Figur param primi row mon trol p b&c) tions c) sh differ Image. An image transformation A(m) 2 R4 is sampled from P (A(m) ) = where the ﬁrst two elements control a global re-scaling and the second two cont ‣ シンボルの描き方に関する“常識”を学習 tion of the center of mass of T (m) . The transformed trajectories can then be ren grayscale image, using an ink model adapted from [10] (see Section SI-2). Th is then ycle data by two noise processes, ‣ motor cperturbed （動画）を用いる which make the gradient more robu tion and encourage partial solutions during classiﬁcation. These processes inclu (m) a Gaussian ﬁlter with standard deviation b and pixel ﬂipping with probabi (m) (m)
- 12. d MNIST benchmark dataset for digit recognition has 6000 training example can classify new images of a foreign handwritten character from just one exa can classify new imagesin theinference intestedcharacterveryjust one exam e. Forty participants of a foreign were this model is fromchallenging Posterior USA handwritten on one-shot classiﬁcati 6, 17]. Similarly, while classiﬁers are generally trainedon hundreds of images 6, 17]. Similarly,Figureclassiﬁersare generally traineddifferent numbersaan ch trial, as in whileImageNet [4] and CIFAR-10/100 [14], image can lea aas 1b, participants space shown an peopleofof n large combinatorial were of on hundreds image hmark datasets such as ImageNet [4] and CIFAR-10/100 [14], people can le mark datasets such that shows the same character. To ensure class on another image one-‐shot classiﬁca2on developed an algorithm for ﬁnding K high-probabi c) c) 各イメージに対して、 completed mostone randomly selected proposed by a f just promising candidates trial from each are the ﬁcation tasks, so5a and detailed in Section の posterior を推定 appro that characters never repeated These parses Ther stroke SI-5. across trials. wo practice trials with the Latin and Greek alphabets, and K feedback type X (m) (m) Human dr ( , a |I ) ⇡ rchial Bayesian Program Learning.P For ✓ test image I (T ) wHuman and (✓ i 2 token ..., 20, we use a Bayesian classiﬁcation rule for which wei=1 compute 1 (T ) (c) where each weight wi is proportional to parse 212 3 score argmax log P (I |I ). b) b) participants c [i ˜ earn a new concept from just one example? (a & b) Where arewi / wiexamples of the other = P ( vely, new conceptare row 4 column 3 (left) (a &rowWhere are 4 (right). c) The lear earnAnswersapproximationone example? and b) 2 column the other examples a the for b) from just uses the HBPL search algorithm to get K ed? Pcolumn 4 (right). c) The le d? Answers abilities suchconstrained such that CMC chains to are rowas generating (left) and row parsing. = 1. around tha rt many other for b) estimatecolumn 3 examples and 2 variability Rather eac and 4 the local type-level i wi 1 rt many other abilities suchre-optimizes the token-level variables ✓ (T ) (al as generating examples and parsing. 2 nt-based searches to 1 approximation can be improved by incorporating so posterior 2 推定されたtypeからの (T ) 3 canoni x cano mage I . The approximation can be written aswhich closelySI-7 fo the token-level variables ✓(m) , (see Section track 1Z ターゲットの生成確率 6.2 (T ) (c) is inexpensive to draw conditional samples from the 1 P (I (T ) |✓ (T ) )P (✓ (T ) | )Q(✓ (c) , , I log P (I |I ) ⇡ log 6 it does not require evaluating the likelihood of the im
- 13. an algorithm for ﬁnding K high-probability parses, [1] , ✓(m)[1] , ..., [K] , ✓(m st promising candidates proposed by a fast, bottom-up image analysis, show iled in Section SI-5. These parses approximate the posterior with a discrete d posterior inference P ( , ✓(m) |I (m) ) ⇡ train e K X i=1 train wi (✓(m) ✓(m)[i] ) ( [i] ), prior からのスコア 正規化 train weight wi is proportional to parse score, marginalizing over 1 shape variables 1 1 Binary image e b) 1 2 train train train traintrain train 22 222 2 1 111 11 aw) P 0 2 2 train 1 1 2 22 11 222 111 2i 1 1 wi / w = P ( ˜ −59.6 0 2 1 2 22 1 1 1 11 111 [i] (m)[i] 222 (m) 22 11 11 111 2 2 1 22 1 x 0 1 −59.6 train 1 1 ,✓ −88.9 −59.6 ,I 12 ) −159 −88.9 1 1 11 111 2 1 1 −88.9 −168 −159 2 1 1 12 −159 −168 ined such that i wi0 = 1. Rather than using just a point estimate for eac -159 -60 -89 -168 1 1 22 22 1 12 2 ion can be improved by incorporating some1of the local variance around the 2 1 1 21 11 1 2 1 1 111 22 222 (m) 2 111 パースの候補を選んで近似 2 1 2 allow evel variables ✓ 1, which closely track122 11111image, 2222211 for1 little variability, the11 2 11 2 2 11 2 1 1 22 11 222 11 1 11 11 111 1 (m)[i] (m) 1 ive to draw conditional samples from the1type-level P ( |✓ ,I ) = P( 1. シンボルの上でランダムウォークを行い、ストロークのサンプル equire evaluating the likelihood of the image, just the local variance around th を得る（150個） d with the token-level ﬁxed. Metropolis Hastings is run to produce N samp each parse ✓(m)[i] , denoted by [i1] , ..., [iN ] , where the improved approxim ge Thinned image ge aned) test 00000 0 test −59.6 −59.6 −59.6 −59.6 −59.6 0 −59.6 test test test test test test test −831 −88.9 −88.9 −88.9 −88.9 −88.9 −59.6 −88.9 0 −159 −159 −159 −159 −159 −88.9 −159 −59.6 −168 −168 −168 −168 −168 −159 −168 −88.9 −168 −159 test −881 −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −831 test −1.98e+03 −1.98e+03 −1.98e+03 −1.98e+03 −1.98e+03 −2.12e+03 −881 −1.41e+03 −983 −1.98e+03 −1.22e+03 −979 −2.07e+03 −2.07e+03 −2.07e+03 −2.07e+03 −2.07e+03 −2.07e+03 −1.41e+03 −1.98e+03 −983 −2.09e+03 −2.09e+03 −2.09e+03 −2.09e+03 −2.09e+03 −1.22e+03 −2.07e+03 −979 −1.18e+03 −1.17e+03 −2.09e+03 −1.72e+03 −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −1.18e+03 −2.09e+03 −1.17e+03 −1.72e+03 −2.12e+03 planning 2. そのストロークのスコアを prior -1273 から計算、上位 K 個に絞る -831 -2041 N K X by a thinning algorithm X (ii) and image. (m) is processed (m) a) The raw image (i) (m) (m) (m) (m)[i] 1 [18] ned ,✓ ned |I ) ⇡ Q( , ✓ planning cleaned ,I )= wi (✓ ✓ ) (
- 14. barest of experience – just one or a handful of の計算 adjustment rceptual input. Although machine learning has nition problems that people solve so effortlessly, 1 2 1 b) 2 1 he baresta)of examples to reach good performance.2121 1 experience – just one or a handful of 1 2 1 i 1 1 1 usands of 11 22 2 2 2 22 1 222 1 22 222 11 111 perceptual input. Although machine learning111has 11 11 111 111 digit recognition people solve so effortlessly, ognition problems that has 6000 training examples per -60 -168 eign handwritten character 2from just one example-159 housands ofii examples to reach good0 performance. -89 1 22 2 1 1 2 1 1 1 for digit recognitiontrained on hundreds of images per 111111 has 6000 training examples per s are generally 22 222 1 22 222 foreign handwritten character from111just one example 11 111 22 11 222 111 11 111 1 1 t [4] andiiiCIFAR-10/100 [14], peopleper learn a can ﬁers are generally trained on hundreds of images Thinned Binary image train train Binary image Binary image train train train train traintrain train 1 1 Traced graph (raw) 2 0 22 1 1 −59.6 0 test test test test test test Thinned image test −59.6 −59.6 −59.6 −59.6 −59.6 0 −59.6 −831 1 2 2 1 1 −881 −2.12e+03 Net [4] and CIFAR-10/100 [14], people can learn a c) planning planning −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −831 −1.98e+03 −1.98e+03 −1.98e+03 −1.98e+03 −1.98e+03 −2.12e+03 −881 1 2 2 11 0 −88.9 −59.6 2 1 1 1 22 1 1 −59.6 −159 −88.9 1 2 1 −88.9 −168 −159 2 1 1 12 − − test test 2 12 traced graph (cleaned) 1 1 1 train 2 test 00000 0 Thinned image Thinned image 2 train −88.9 −88.9 −88.9 −88.9 −88.9 −59.6 −88.9 0 −159 −159 −159 −159 −159 −88.9 −159 −59.6 −168 −168 −168 −168 −168 −159 −168 −88.9 1 2 test 2 1 1 1 2 1 1 2 −1.41e+03 −983 −1.98e+03 1 2 2 11 −2.07e+03 −2.07e+03 −2.07e+03 −2.07e+03 −2.07e+03 −1.41e+03 −1.98e+03 −983 −2.09e+03 −2.09e+03 −2.09e+03 −2.09e+03 −2.09e+03 −1.22e+03 −2.07e+03 −979 −1.22e+03 −979 −2.07e+03 1 1 2 1 −1.18e+03 −1.17e+03 −2.09e+03 −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −2.12e+03 −1.18e+03 −2.09e+03 −1.17e+03 − −1 1 2 2 1 1 −1.7 −2.1 −1.72 −2.1 planning c) -831 -1273 -2041 e 5: Parsing a raw image. a) The raw image (i) is processed by a thinning algorithm [18] (ii) zed as an undirected graph [20] (iii) where parses are guided random walks (Section SI-5). b) parses found for that image (top row) are shown with their log wj (Eq. 5), where numbers insid Human drawers 推定された type circles denote sub-stroke breaks. These ﬁ Human drawers e stroke order and starting position, and smaller open 変数を用いて、ターゲットの re-ﬁt to three different raw images of characters (left in image triplets), where the best parse (t token 変数を推定（MCMC） 2 1 1 are shown 1 1 s associated image reconstruction (bottom right) above its score (Eq. 9). 2 2 2 1 1 3 1 3 n an approximate posterior for a particular image, the model 3can evaluate the posterior 2 3 2 score of a new& b) Wherere-ﬁtting the examples of the image by are the other token-level variables 5(bottom Figure 5b), as 5.3 expl canonical 5.1 3 3 e example? (a planning cleaned planning cleaned planning cleaned
- 15. まとめ ‣ モデルはかなり作り込んでいてアドホック ‣ しかし、機械学習も人間と同じように一つのexampleから分類・ 生成が行えることを示した、という点で面白い ‣ 人間がどのように特徴を抽出しているか、ということの 理解に繋がる

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment