SlideShare a Scribd company logo
1 of 19
Download to read offline
陳 2019.5.30
1
Issue
• Straight as a neural model (vs. CKY / transition-based genres)

1. A novel neural part:

• Parsing by syntactic distance: hinge loss for ranking

2. Greedy and recursive decoding part

• Top-down split: avoid compounding error

(bottom up combination is feasible, but may not be so good at
avoiding such error)

• F1 score: 91.8 - main stream among neural methods.
!2
Syntactic Distance
• Represents the syntactic relationships between all successive
pairs of words in a sentence.

1. Convert a parse tree into a
representation based on
distances between consecutive
words;

2. Map the inferred representation
back to a complete parse tree.
Training(1+2) / Parsing(2):
!3
Previous Studies - 1
• Serializing a tree into a sequence …

• Of syntactic tokens

using misc. seq2seq model as a parser (Vinyals et al., 2015)

• Of transitions / shift-reduce actions

producing action/tag/label with current state to from a tree
(Stern et al., 2017; Cross and Huang, 2016)

• Chart parsers

• Non-linear triangular potentials + dynamic programming
search (Gaddy et al., 2018)
!4
Treebank
( S

( NP

( PRP She )

VP

( ( VBZ enjoys )

( S

( VP

( VBG playing)

( NP ( NN tennis ))

)

)

(. .)

)
Input: She enjoys playing tennis .

+( PRP VBZ VBG NN . )

Output: (tree-structured output)
!5
Previous Studies - 2
• Transition-based models (Dyer et al., 2016)
How about
a mistake
happening
here?😈
!6
Previous Studies - 2
• Transition-based models (Dyer et al., 2016)
Compounding errors cause further compounding errors
because the model is never exposed to its own mistakes.
(a.k.a. exposure bias in all stateful applications)
Dynamic oracles and beam search provides limited
improvements.

(Goldberg and Nivre, 2012; Cross and Huang, 2016)
How about
a mistake
happening
here?😈
!6
Previous Studies - 3
Chart parser

• Good old CKY algorithm

• Free from previous state with
fenceposts (when training).

• Decoding is time-exhausting.

• SOTA (Kitaev and Klein, 2018)

F1 > 95 (Self-Att x8 + BERT/ELMo)

F1 ~ 93 (Self-Att x8)

decoder implemented in Cython

fenceposts→
BiLSTM
!7
Comparison
• Simplicity:

• BiLSTM x2; CNN x1; FNN x2 (vs. self-attention)

• Greedy decoding (vs. chart parsers)

• Sequence hinge error (vs. chart parsers)

• Decoupled / state-free (vs. transition-based)

• High speed:

• Greedy decoding - O(n log n) | T ~ O(n)
!8
Model Architecture
!9
Syntactic Distance
1. Convert a parse tree into a
representation based on
distances between consecutive
words;

2. Map the inferred representation
back to a complete parse tree.
Training(1+2) / Parsing(2):
!10
Def 2.1:
• tree T;

• leaves (w0, …, wn) of T;

• Height dji of the lowest common ancestor for two leaves (wi,wj)

• Syntactic distace d = (d1, …, dn)
~
Relationship between heights and distances:
!11
Tree to tensors for training Tensor to tree for decoding / parsing
←Leaf node @ height 0
←height++
←concat distance in word sequential order
←POS tags
←greedy split
↑append sub-trees
Greedy top-down: n x search log n
Confidential top-down (Stern et al., 2017a):
n-ary nodes: leftmost split (like CNF)
←Leaf w/o syntactic label
← but w. POS tag
!12
⑵→
⑶→
⑷→
⑸→
⑹⑺→
Prepare context:
Bottom syntactic labels:
Prepare relationships (1/2):
Output distances:
Prepare relationships (2/2):
Output syntactic labels for chunks:
Hinge loss & ranking distances
• assert all(isinstance(di, int) for di in d), ‘nature of heights’

• MSE loss:

• sum((di - dp) ** 2 for di,dp in zip(d, pred_d))

• Hinge loss for ranking:
!13
∵
recall
Experiments
!14
!15
Penn Treebank Chinese Treebank
Model settings
• Train: 2-21 (45k)

• Devel: 22 (1.7k)

• Test: 23 (2.4k)
• Word embedding

• D = 400

• rand_unk = 0.1

• LSTM

• D = 1200

• Dropout = 0.2

• CNN

• D = 1200

• Win = 2

• FFN

• Dropout = 0.3

• Adam

• L2 beta = 1e-6
• Train: 001-270, 440-1151 

• devel: 301-325 

• test: 271-300

• Word embedding

• D = 400

• rand_unk = 0.1

• LSTM

• D = 1200

• Dropout = 0.1

• CNN

• D = 1200

• Win = 2

• FFN

• Dropout = 0.4

• Adam

• L2 beta = 1e-6
Data settings
!16
←PTB
S2S
T.td
T.bu
T.bu
T.td
C
T.bu
T.in
T.td
.gen
@O(n3)
C-alike
!17
use an NVIDIA TITAN Xp for neural, an Intel Core
i7-6850K CPU, with 3.60GHz for tree inference
Conclusion
• Parallelization & “Neuralization”: Greedy

• Way to avoid exposure bias: decoupling

• Make “output variables conditionally independent given
the inputs.”
!18

More Related Content

Similar to N20190530

Similar to N20190530 (20)

Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
I20191007
I20191007I20191007
I20191007
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Attentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernelsAttentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernels
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Intro to threp
Intro to threpIntro to threp
Intro to threp
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
 
Graph processing
Graph processingGraph processing
Graph processing
 
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
N20190729
N20190729N20190729
N20190729
 
Lecture12
Lecture12Lecture12
Lecture12
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Normalisation by vmb
Normalisation by vmbNormalisation by vmb
Normalisation by vmb
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration Tooling
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 

N20190530

  • 2. Issue • Straight as a neural model (vs. CKY / transition-based genres) 1. A novel neural part: • Parsing by syntactic distance: hinge loss for ranking 2. Greedy and recursive decoding part • Top-down split: avoid compounding error
 (bottom up combination is feasible, but may not be so good at avoiding such error) • F1 score: 91.8 - main stream among neural methods. !2
  • 3. Syntactic Distance • Represents the syntactic relationships between all successive pairs of words in a sentence.
 1. Convert a parse tree into a representation based on distances between consecutive words; 2. Map the inferred representation back to a complete parse tree. Training(1+2) / Parsing(2): !3
  • 4. Previous Studies - 1 • Serializing a tree into a sequence … • Of syntactic tokens
 using misc. seq2seq model as a parser (Vinyals et al., 2015) • Of transitions / shift-reduce actions
 producing action/tag/label with current state to from a tree (Stern et al., 2017; Cross and Huang, 2016) • Chart parsers • Non-linear triangular potentials + dynamic programming search (Gaddy et al., 2018) !4
  • 5. Treebank ( S
 ( NP
 ( PRP She )
 VP
 ( ( VBZ enjoys )
 ( S
 ( VP
 ( VBG playing)
 ( NP ( NN tennis ))
 )
 )
 (. .)
 ) Input: She enjoys playing tennis . +( PRP VBZ VBG NN . ) Output: (tree-structured output) !5
  • 6. Previous Studies - 2 • Transition-based models (Dyer et al., 2016) How about a mistake happening here?😈 !6
  • 7. Previous Studies - 2 • Transition-based models (Dyer et al., 2016) Compounding errors cause further compounding errors because the model is never exposed to its own mistakes. (a.k.a. exposure bias in all stateful applications) Dynamic oracles and beam search provides limited improvements.
 (Goldberg and Nivre, 2012; Cross and Huang, 2016) How about a mistake happening here?😈 !6
  • 8. Previous Studies - 3 Chart parser • Good old CKY algorithm • Free from previous state with fenceposts (when training). • Decoding is time-exhausting. • SOTA (Kitaev and Klein, 2018)
 F1 > 95 (Self-Att x8 + BERT/ELMo)
 F1 ~ 93 (Self-Att x8)
 decoder implemented in Cython
 fenceposts→ BiLSTM !7
  • 9. Comparison • Simplicity: • BiLSTM x2; CNN x1; FNN x2 (vs. self-attention) • Greedy decoding (vs. chart parsers) • Sequence hinge error (vs. chart parsers) • Decoupled / state-free (vs. transition-based) • High speed: • Greedy decoding - O(n log n) | T ~ O(n) !8
  • 11. Syntactic Distance 1. Convert a parse tree into a representation based on distances between consecutive words; 2. Map the inferred representation back to a complete parse tree. Training(1+2) / Parsing(2): !10 Def 2.1: • tree T; • leaves (w0, …, wn) of T; • Height dji of the lowest common ancestor for two leaves (wi,wj) • Syntactic distace d = (d1, …, dn) ~ Relationship between heights and distances:
  • 12. !11 Tree to tensors for training Tensor to tree for decoding / parsing ←Leaf node @ height 0 ←height++ ←concat distance in word sequential order ←POS tags ←greedy split ↑append sub-trees Greedy top-down: n x search log n Confidential top-down (Stern et al., 2017a): n-ary nodes: leftmost split (like CNF) ←Leaf w/o syntactic label ← but w. POS tag
  • 13. !12 ⑵→ ⑶→ ⑷→ ⑸→ ⑹⑺→ Prepare context: Bottom syntactic labels: Prepare relationships (1/2): Output distances: Prepare relationships (2/2): Output syntactic labels for chunks:
  • 14. Hinge loss & ranking distances • assert all(isinstance(di, int) for di in d), ‘nature of heights’ • MSE loss: • sum((di - dp) ** 2 for di,dp in zip(d, pred_d)) • Hinge loss for ranking: !13 ∵ recall
  • 16. !15 Penn Treebank Chinese Treebank Model settings • Train: 2-21 (45k) • Devel: 22 (1.7k) • Test: 23 (2.4k) • Word embedding • D = 400 • rand_unk = 0.1 • LSTM • D = 1200 • Dropout = 0.2 • CNN • D = 1200 • Win = 2 • FFN • Dropout = 0.3 • Adam • L2 beta = 1e-6 • Train: 001-270, 440-1151 • devel: 301-325 • test: 271-300 • Word embedding • D = 400 • rand_unk = 0.1 • LSTM • D = 1200 • Dropout = 0.1 • CNN • D = 1200 • Win = 2 • FFN • Dropout = 0.4 • Adam • L2 beta = 1e-6 Data settings
  • 18. !17 use an NVIDIA TITAN Xp for neural, an Intel Core i7-6850K CPU, with 3.60GHz for tree inference
  • 19. Conclusion • Parallelization & “Neuralization”: Greedy • Way to avoid exposure bias: decoupling • Make “output variables conditionally independent given the inputs.” !18