Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ACL読み会2014@PFI "Less Grammar, More Features"

5,218 views

Published on

Published in: Technology
  • Be the first to comment

ACL読み会2014@PFI "Less Grammar, More Features"

  1. 1. Less  Grammar,  More  Features David  Hall,  Greg  Durre6  and  Dan  Klein @  Berkeley 能地  宏  (@nozyh) NII
  2. 2. この論文の主張 ‣ 低レイヤー  NLP  タスクの曖昧性を解消するには、単語の表層から の素性があれば十分 評判分析 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts Stanford University, Stanford, CA 94305, USA richard@socher.org,{aperelyg,jcchuang,ang}@cs.stanford.edu {jeaneis,manning,cgpotts}@stanford.edu Abstract Semantic word spaces have been very use- ful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation re- sources and more powerful models of com- position. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment composition- ality. To address them, we introduce the Recursive Neural Tensor Network. When – 0 0 This 0 film – – – 0 does 0 n’t 0 + care + 0 about + + + + + cleverness 0 , 0 wit 0 or + 0 0 any 0 0 other + kind + 0 of + + intelligent + + humor 0 . Figure 1: Example of the Recursive Neural Tensor Net- work accurately predicting 5 sentiment classes, very neg- ative to very positive (– –, –, 0, +, + +), at every node of a parse tree and capturing the negation and its scope in this Sochar  et  al.’13 Deep  Learning  以上の性能 構文解析 多くの言語で  Berkeley  parser  以上
  3. 3. 句構造構文解析 ‣ 文の背後にある木構造を推定する -­‐ あらゆる上位レイヤーの処理でボトルネック? -­‐ 目的は曖昧性の解消
  4. 4. 目的は曖昧性の解消 He        eats                sushi          with              chops.cks N S PPNP V VP VP P NP He        eats                sushi          with              chops.cks N S PPNP V NP VP P NP
  5. 5. 目的は曖昧性の解消 ‣ どちらの構造も、文法的には正しい -­‐ 人間の解釈は左側なので、左側の構造を推定することが目的 He        eats                sushi          with              chops,cks N S PPNP V VP VP P NP He        eats                sushi          with              chops,cks N S PPNP V NP VP P NP
  6. 6. He        eats                sushi          with              chops,cks N S PPNP V VP VP P NP He        eats                sushi          with              chops,cks N S PPNP V NP VP P NP Naive  PCFG  では性能が低い VP  -­‐>  V  NP 0.2 NP  -­‐>  NP  PP 0.15 VP  -­‐>  VP  PP 0.1 0.1  ×  0.2  =  0.02 0.2  ×  0.15  =  0.03 PCFG  は曖昧性の解消には不十分 F1-­‐Score:  72.1 Treebank  から確率値を推定
  7. 7. He            eats                      sushi                        with            chops,cks S PP V VP N NP VP P NP Head  lexicalizaWon Eisner’96;  Collins’97 [I] [eat] [I] [sushi] [sushi] [eat] [eat] [with] [eat] •  葉ノードの情報を伝播させる •  (eats,  with)  の関係を捉えられる •  ルールの数が膨大 •  多言語への拡張性  (headの情報に依存) 欠点:
  8. 8. Latent  annotaWon  (state  spliang) Matsuzaki  et  al.’05;  Petrov  et  al.’06 He            eats                      sushi                        with            chops,cks S PP-­‐1 V-­‐1 VP-­‐2 N-­‐3 NP-­‐4 VP-­‐3 P-­‐1 NP-­‐2 •  各ノードに存在する隠れ状態を推定 •  現在の  Berkeley  Parser  の実装;    F1-­‐score:  90.2
  9. 9. これまでの手法のまとめ ‣ これまでの手法は基本的に、  CFG  のルールを増やすことで、 大域的な情報を取り出してきた ‣ lexicalizaWon:  部分木に  head  の情報を付与する -­‐ shic-­‐reduce  系の手法も当てはまる        Zhang  and  Clark’09;  Zhu  et  al.’13 ‣ ノードに粗い情報を付与する -­‐ 言語学的な分析に基づく        Klein  and  Manning’03  (Stanford  parser) -­‐ 隠れ変数として  EM  で推定          Petrov  et  al.’06  (Berkeley  parser) VP  [eat] VP  [eat] PP  [with] VP  ^S VP PP  ^VP VP-­‐3 VP-­‐2 PP-­‐1
  10. 10. 本研究のアプローチ ‣ アノテーションを最低限にした状態で、構文解析の精度をあげる ことは果たして可能か? -­‐ 曖昧性の解消を行う際、ノードに情報を付与することは本当に必要なのか ‣ モチベーション -­‐ lexicalized  parser  は  head  の情報が必要だが、言語によっては  head  の情報 が利用できないことがある(リソース不足により) -­‐ Berkeley  parser  は、単語の表層の情報をあまり使わない -­‐ morphological  rich  language  に弱い(チューニングが必要) -­‐ 実験によって、本手法が多言語の解析により有効であることを示す
  11. 11. 本研究のアプローチ ‣ 曖昧性の解消の多くは、ルールの貼るスパンの回りの表層を見る ので十分なのではないか? He        eats                sushi          with              chops,cks N S PPNP V NP VP P NP He        eats                sushi          with              chops,cks N S PPNP V VP VP P NP [FIRSTWORD=eats  ×  RULE=VP→V  PP] [SPANLENGTH=5  ×  RULE=VP→V  PP] [LASTWORD=chop..  ×  RULE=VP→V  PP] [LASTWORD=chop..  ×  RULE=VP→V  NP] [SPANLENGTH=5  ×  RULE=VP→V  NP] [FIRSTWORD=eats  ×  RULE=VP→V  NP] 負の重みが学習されて欲しい
  12. 12. Result  Overview n  40 0.1 0.5 0.2 0.9 0.3 reebank develop- 40, for different d on top of the X- y span feature is ules and rule par- nchored rule pro- ing an annotation does is refine the Test  40 Test all Berkeley 90.6 90.1 This work 89.9 89.2 Table 3: Final Parseval results for the v = 1, h = 0 parser on Section 23 of the Penn Treebank. 5.2 Lexical Annotation Another commonly-used kind of structural an- notation is lexicalization (Eisner, 1996; Collins, 1997; Charniak, 1997). By annotating grammar nonterminals with their headwords, the idea is to better model phenomena that depend heavily on the semantics of the words involved, such as coor- dination and PP attachment. Table 2 shows results from lexicalizing the X- Arabic Basque French German Hebrew Hungarian Korean Polish Swedish Avg Dev, all lengths Berkeley 78.24 69.17 79.74 81.74 87.83 83.90 70.97 84.11 74.50 78.91 Berkeley-Rep 78.70 84.33 79.68 82.74 89.55 89.08 82.84 87.12 75.52 83.28 Our work 78.89 83.74 79.40 83.28 88.06 87.44 81.85 91.10 75.95 83.30 Test, all lengths Berkeley 79.19 70.50 80.38 78.30 86.96 81.62 71.42 79.23 79.18 78.53 Berkeley-Tags 78.66 74.74 79.76 78.28 85.42 85.22 78.56 86.75 80.64 80.89 Our work 78.75 83.39 79.70 78.43 87.18 88.25 80.18 90.66 82.00 83.17 Table 4: Results for the nine treebanks in the SPMRL 2013 Shared Task; all values are F-scores for sentences of all lengths using the version of evalb distributed with the shared task. Berkeley-Rep is the best single parser from (Bj¨orkelund et al., 2013); we only compare to this parser on the development Berkeley-­‐Rep:  Berkeley  parser  で、低頻度語を言語毎にチューニング した素性表現で置き換える 多言語データ:  SPMPL  2013  Shared  Task
  13. 13. モデル:CRF  Parsing Finkel  et  al.’07 ng comes tput be a es a min- ure a ba- but relies ive accu- e the fea- all back- , such as ured by a , are nat- on. The s are ade- on, which e reflexes will often wer seems Finkel et al. (2008) and Petrov and Klein (2008a). Formally, we define the probability of a tree T conditioned on a sentence w as p(T|w) / exp ✓| X r2T f(r, w) ! (1) where the feature domains r range over the (an- chored) rules used in the tree. An anchored rule r is the conjunction of an unanchored grammar rule rule(r) and the start, stop, and split indexes where that rule is anchored, which we refer to as span(r). It is important to note that the richness of the backbone grammar is reflected in the structure of the trees T, while the features that condition di- rectly on the input enter the equation through the anchoring span(r). To optimize model parame- ters, we use the Adagrad algorithm of Duchi et al. I                    eat                      sushi                      with                      chops.cks S PP V VP N NP VP P NP Inside-­‐Outsideで周辺確率を計算 AdaGrad  +  L2  (オンライン学習)
  14. 14. 素性の抽出 averted financial disaster VP NPVBD JJ NN PARENT = VP FIRSTWORD = averted LENGTH = 3 RULE = VP → VBD NP PARENT = VP Span properties Rule backoffs Features ... 5 6 7 8 ... LASTWORD = disaster FIRSTWORD = averted LASTWORD = disaster PARENT = VP FIRSTWORD = averted RULE = VP → VBD NP Figure 1: Features computed over the application of the rule VP ! VBD NP over the anchored span averted financial disaster with the shown in- for parsing – if nothing else, parsing comes a structural requirement that the output be a -formed, nested tree. Our parser uses a min- PCFG backbone grammar to ensure a ba- evel of structural well-formedness, but relies ly on features of surface spans to drive accu- Formally, our model is a CRF where the fea- factor over anchored rules of a small back- grammar, as shown in Figure 1. ome aspects of the parsing problem, such as ree constraint, are clearly best captured by a G. Others, such as heaviness effects, are nat- y captured using surface information. The question is whether surface features are ade- e for key effects like subcategorization, which deep definitions but regular surface reflexes the preposition selected by a verb will often rly follow it). Empirically, the answer seems yes, and our system produces strong results, up to 90.5 F1 on English parsing. Our parser so able to generalize well across languages Finkel et al. (2008) and Petrov and Klein ( Formally, we define the probability of a conditioned on a sentence w as p(T|w) / exp ✓| X r2T f(r, w) ! where the feature domains r range over t chored) rules used in the tree. An anchor r is the conjunction of an unanchored gr rule rule(r) and the start, stop, and split where that rule is anchored, which we ref span(r). It is important to note that the rich the backbone grammar is reflected in the st of the trees T, while the features that condi rectly on the input enter the equation thro anchoring span(r). To optimize model p ters, we use the Adagrad algorithm of Duc (2010) with L2 regularization. We start with a simple X-bar grammar only symbols are NP, NP-bar, VP, and so o base model has no surface features: form 0 0 1 0 … 0 1 0 1 10.3 -­‐1.2 3.2 0.01 … 0.3 0.1 -­‐20.1 10.1 内積でスコア計算 PCFG  のルール確率に対応    CKY  チャートのスコアに
  15. 15. どのような素性が有効か Features Section F1 RULE 4 73.0 + SPAN FIRST WORD + SPAN LAST WORD + LENGTH 4.1 85.0 + WORD BEFORE SPAN + WORD AFTER SPAN 4.2 89.0 + WORD BEFORE SPLIT + WORD AFTER SPLIT 4.3 89.7 + SPAN SHAPE 4.4 89.9 1: Results for the Penn Treebank development set, reported in F1 on sentences of length ction 22, for a number of incrementally growing feature sets. We show that each feature ted in Section 4 adds benefit over the previous, and in combination they produce a reaso yet simple parser. atures are bucketed together. During train- ere are no collisions between positive fea- which generally receive positive weight, and fixes of the current word up to length 5, regar of frequency. Subsequent lines in Table 1 indicate addi 長さ40以下、WSJ  Sec.  22    (development) ほとんどの意味は直感的に分かる 以下、具体例でどのような文に役立つか説明
  16. 16. Word  before/acer  span no read messages in his inbox VP VBP NNS VP → no VBP NNS gure 2: An example showing the utility of span ntext. The ambiguity about whether read is an jective or a verb is resolved when we construct VP and notice that the word proceeding it is un- ely. NP → (NP ... impact) PP) ( CEO of Enron ) PRN (XxX) Figure 4: Computation o two examples. Parenthe punctuation-heavy, short being explicitly modeled stance of this feature tem that is more likely to tak no      read      messages      in  ... JJ NNS NP read  の品詞は  VBP  か  JJ  か? read  messages  を張るルールを決める際、VP  の前に  no  は来ない、 という情報が手がかりになる(負の重みが学習されてほしい)
  17. 17. Word  before/acer  split adjective or a verb is resolved when we construct a VP and notice that the word proceeding it is un- likely. has an impact on the market PPNP NP NP → (NP ... impact) PP) Figure 3: An example showing split point features disambiguating a PP attachment. Because impact is likely to take a PP, the monolexical indicator feature that conjoins impact with the appropriate rule will help us parse this example correctly. lengths 1, 2, 3, 4, 5, 10, 20, and 21 words. punctuation being expli stance of th that is mor and so we and encour generally u attachment of the noun example, c indicator o diately afte tures with i with a rule split point. 4.4 Span We add on PP  a6achment impact  は修飾を受けやすい名詞    大きい重みが学習されて欲しい 各句の  head  は、前後両端のどちらかに来やすいという情報を利用 (多くの言語で成り立つ;日本語の文節の  head  は右端)
  18. 18. Span  shape box e utility of span ether read is an en we construct ceeding it is un- ( CEO of Enron ) PRN (XxX) said , “ Too bad , ” VP x,“Xx,” Figure 4: Computation of span shape features on two examples. Parentheticals, quotes, and other punctuation-heavy, short constituents benefit from being explicitly modeled by a descriptor like this. stance of this feature template. impact is a noun that is more likely to take a PP than other nouns, and so we expect this feature to have high weight 先頭の大文字、括弧を抽出する (英語の場合)named  enWty  の判別、括弧の一致など
  19. 19. Less  Grammar  の意味について ‣ 言語学を捨てて機械学習だけで問題が解決できる、ということ ではない -­‐ ここでの  Grammar  は、用いる  CFG  ルールのサイズのこと -­‐ 本論文の主張は、表層から意味のある素性を抽出すれば、小さな文法 でも十分である、というもの -­‐ 用いている機械学習はシンプル(CRF  +  SGD) ‣ 設計に必要な言語学的知識は、既存手法のほうが少ない? -­‐ Berkeley  parser:  確率モデルで  EM  で  spliang  (全自動) -­‐ shic-­‐reduce:  突っ込める素性はとにかく突っ込む
  20. 20. 余談:この研究の方向性 ‣ EMNLP  2013  の共参照の論文と方向性が同じに見える -­‐ 共参照解析は、menWon  間の表層から取り出した素性のみに基づく 識別モデルを用いることで、最高精度を達成できる (WordNet  等の外部知識は必要ではない) -­‐ Berkeley  coreference  はツール公開中で、Stanford  より高い精度(のはず) Easy  Victories  and  Uphill  Ba6les  in  Coreference  ResoluWon Greg  Durre6  and  Dan  Klein  (Berkeley) [Barack$Obama]1$met$with$[David$Cameron]2$.$[He]1$said$... [with$X%−%.%Y] [with$X%−%Y%said] ... Centering with%[X]%. .%[X]%said NLP  の多くの解析タスクは、 単語の表層からうまく素性を 選べば高精度が達成できる
  21. 21. SenWment  analysis ‣ Mechanical  turk  を使って木構造の上に5段階のラベルを付与した ‣ Neural  net  で既存手法より良いことを示した  (去年の  EMNLP) ing,cgpotts}@stanford.edu - r s n s - - a d e s - e – 0 0 This 0 film – – – 0 does 0 n’t 0 + care + 0 about + + + + + cleverness 0 , 0 wit 0 or + 0 0 any 0 0 other + kind + 0 of + + intelligent + + humor 0 . Figure 1: Example of the Recursive Neural Tensor Net- work accurately predicting 5 sentiment classes, very neg- ative to very positive (– –, –, 0, +, + +), at every node of a parse tree and capturing the negation and its scope in this Sochar  et  al.’13
  22. 22. 本研究の手法がそのまま適応できる ‣ 木構造が与えられた上で、各スパンを5段階に分類 -­‐ 構造を固定して  Inside-­‐Outside,  CKY  を走らせる While “ Gangs ” is never lethargic , it is hindered by its plot . 4 1 2 2 → (4 While...) 1 Figure 5: An example of a sentence from the Stan- ford Sentiment Treebank which shows the utility of our span features for this task. The presence 7.1 Ada Our parse parser tha the treeb with the with very terminals fective an are not us One s analysis a スパンの先頭の語が論理関係であることが多い;  but  など
  23. 23. Neural  net  よりも高い性能 Root All Spans Non-neutral Dev (872 trees) Stanford CoreNLP current 50.7 80.8 This work 53.1 80.5 Non-neutral Test (1821 trees) Stanford CoreNLP current 49.1 80.2 Stanford EMNLP 2013 45.7 80.7 This work 49.6 80.4 Table 5: Fine-grained sentiment analysis results on the Stanford Sentiment Treebank of Socher et al. (2013). We compare against the printed num- bers in Socher et al. (2013) as well as the per- formance of the corresponding release, namely the sentiment component in the latest version of the Stanford CoreNLP at the time of this writ- References Anders Bj¨orke Thomas M (Re)ranking Results from ceedings of ing of Morp Rens Bod. 1 Stochastic Conference for Comput Peter F Brow Vincent J D Class-based Computatio 参考:今年の  ACL  で別の論文 Nal  Kalchbrenner,  Edward  GrefensteJe,  Phil  Blunsom:      A  ConvoluRonal  Neural  Network  for  Modelling  Sentences Neural  net  で、木構造を仮定せずに、senWment  を分類する Test  set  で、48.5  point  (Stanford  current  より少し低い)
  24. 24. まとめ ‣ 構文解析で精度を出すためには、ノードに情報を付与し、 ルールを増やすことが必要と信じられていていた ‣ ルールの数を最小にした構文解析 -­‐ 言語/文法への依存性が小さい    多言語への拡張性が高い -­‐ 素性を少し変更することで、他のタスクにも適応できる  (SenWment) ‣ Parser  は公開中  (epic  parser) ‣ 得るべき教訓(?) -­‐ 単語の表層から得られる情報は  (やっぱり)  非常に強力 -­‐ 意味のある素性を抽出できれば、複雑な手法に匹敵する精度を出せる h6ps://github.com/dlwh/epic

×