SlideShare a Scribd company logo
1 of 22
Download to read offline
Deep Dependency Graph
Conversion in English
15th International Workshop on

Treebanks and Linguistic Theories
January 20th, 2017

Jinho D. Choi
Why Dependency Structure?
2
Many robust and scalable dependency parsers are available.
Parser Reference Accuracy Tokens / Sec.
Yara Rasooli and Tetreault, 2015 89.32 9,838
Stanford Chen and Manning, 2014 89.59 8,602
spaCy Honnibal et al., 2013 90.86 13,963
NLP4J Choi and McCallum, 2013 91.72 10,271
Comparison between greedy dependency parsers on OntoNotes.
It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool
Jinho D. Choi, Joel Tetreault, and Amanda Stent,ACL, 2015.
State-of-the-art achieved by a non-greedy parser: 92.50.
Parsing the entire English Wikipedia in 60 hours.
Why Dependency Structure?
3
It is considered “more” universal.
http://universaldependencies.org
48+ languages
391K tokens
Why Dependency Conversion?
4
Most treebanks are annotated with constituency trees in English.
Treebank Trees Tokens
OntoNotes 138,566 2,620,495
BOLT 78,734 949,300
English Web 16,622 254,830
QuestionBank 4,000 38,188
THYME 88,893 936,166
SHARP 50,725 499,834
CRAFT 21,710 561,017
MIPACQ 19,141 269,178
GENIA 18,541
Total 436,932 6,129,008
vs. 391K tokens in UD
Covers
20+
genres
Towards Deep Structure
5
Most dependency parsing approaches have focused on tree parsing.
CoNLL 2006, 2007, 2008, 2009, 2017 shared tasks.
Allows to develop
efficient parsing models
PROS
Cannot represent
the complete relations
CONS
PropBank
Abstract
Meaning Representation
Semantic
Dependency Parsing
http://sdp.delph-in.net
http://verbs.colorado.edu/propbank/
http://amr.isi.edu
http://nlp.cs.nyu.edu/meyers/NomBank.html
NomBank
Towards Deep Structure
6
PropBank
Abstract
Meaning Representation
Semantic
Dependency Parsing
NomBank
Focused on

verbal predicates
Focused on

nominal predicates
Added nominal and
adjectival predicates
Semantically oriented
Penn Treebank
Syntactically

oriented
Do not necessarily
agree with Treebank
Relatively small Limited to one genre
Deep Dependency Graph
7
Consistent representations regardless of syntactic variations.
Rich predicate argument structures.
A large number of deep dependency graphs in multiple genres.
Objectives
Dative
Expletive
Passive
Coordination
Arguments
Small Clause
Open Clause
Relative Clause
Predicates
Secondary
LightVerb
Auxiliaries
Modal
RaisingVerb
Secondary Predicate
8
Universal Dependency
Secondary predicate → function tag PRD.
vs.
Secondary Predicate
9
Universal Dependency Deep Dependency
LightVerb Construction
10
Light verbs = {make, take, have, do, give, keep}
Eventive nouns are collected from PropBank.
Universal Dependency
Dative
11
Dative → indirect object, DTV or BNF.
Expletive
12
Expletive → existential “there” or pleonastic “it”.
vs.
vs.
Deep Dependency
Universal Dependency
part-of-speech
EX
Expletive
13
Expletive → existential “there” or pleonastic “it”.
Universal DependencyDeep Dependency
empty category
*EXP*
Passive Construction
14
ç
Heuristics for LINK-PSV
in PropBank Secondary Dependency
Coordination
15
Arguments in coordination are

explicitly represented in constituency trees.
Small Clause
16
Deep Dependency
Universal Dependency
Deep Dependency
Universal Dependency
ç
ç
ç
ç
Small clause → S consisting of only SBJ and PRD.
Open Clause
17
Open clause → a clause without an internal subject.
ç ç
ç
ç
Relative Clause
18
Open clause → empty category *T*.
Heuristics for LINK-SLC in PropBank
?
Modal Adjective
19
These arguments can be identified in constituency trees; they are either the siblings
of the coordinated verbs (e.g., the book and last year in Figure 11a) or the siblings
of the verb phrases that are the ancestors of these verbs (e.g., John in Figure 11a).
When the coordination is not on the same level, right node raising is used, which can
be identified by the empty category *RNR*-d. In Figure 11b, John is coordinated
across the verb phrase including value and the preposition phrase including for.
Unlike the coordinated verbs in Figure 11a that are siblings, these are not siblings
so need to be coordinated through right node raising. The coordinated arguments
are represented by the secondary dependencies to avoid multiple heads.
3.3 Auxiliaries
Modal adjective Modal adjectives are connected with the class of modal verbs
such as can, may, or should that are used with non-modal verbs to express possibility,
permission, intention, etc:
able 915 ready 105 prepared 32 due 24 glad 21
likely 235 happy 69 eager 30 sure 24 unwilling 20
willing 173 about 49 free 30 determined 22 busy 18
unable 165 reluctant 44 unlikely 28 afraid 22 qualified 16
Table 1: Top-20 modal adjectives and their counts from the corpora in Table 3.
4https://github.com/emorynlp/nlp4j
An adjectival predicate including an open clause

whose external subject is the subject of the adjectival predicate.
RaisingVerb
20
Raising verb → empty category *-d to SBJ.
Raising verb Distinguished from most of the previous work, raising verbs modify
the “raised” verbs in DDG. A verb is considered a raising verb if 1)it is followed
by a clause whose subject is the empty category *-d, and 2)the antecedent of the
empty category is the subject of the raise verb. In Figure 13, the raising verbs go
have, and keep are followed by the clauses whose subjects are the empty categories
*-1, *-2, and *-3, which all link to the same subject as the raised verb, study.
have 1,846 begin 825 stop 379 keep 158 prove 89
go 1,461 seem 787 be 322 use 157 turn 67
continue 1,210 appear 714 fail 233 get 136 happen 38
need 1,038 start 546 tend 168 ought 91 expect 38
Table 2: Top-20 raising verbs and their counts from the corpora in Table 3.
3.4 Semantic Roles
As shown in Figure 9a, semantic roles are extracted from certain function tags and
added to the terminal heads of the phrases that include such function tags. The
function tags used to extract semantic roles are: DIR: directional, EXT: extent, LOC
locative, MNR: manner, PRP: purpose, and TMP: temporal.
4 Analysis
4.1 Corpora
Six corpora that consist of the Penn Treebank style constituency trees are used to
generate deep dependency graphs: OntoNotes (Weischedel et al. [38]), the English
Web Treebank (Web; Petrov and McDonald [35]), QuestionBank (Judge et al. [23])
and the MiPACQ|Sharp|Thyme corpora (Albright et al. [2]). All together, these
corpora cover 20 different genres including formal, colloquial, conversational, and
clinical documents, providing enough diversities to our dependency representation
Deep Dependency Labels
21
Type Label Description Primary Secondary
Subject
csbj Clausal subject 5,291 123
expl Expletive 10,808 0
nsbj Nominal subject 298,418 71,383
Object
comp Clausal complement 86,884 105
dat Dative 6,763 87
obj (Direct or preposition) object 205,149 20,785
Auxiliary
aux Auxiliary verb 148,829 0
cop Copula 81,661 0
lv Light verb 7,655 0
modal Modal (verb or adjective) 49,259 0
raise Raising verb 10,598 0
acl Clausal modifier of nominal 24,791 7
appo Apposition 32,460 17
Nominal attr Attribute 352,939 14
and det Determiner 334,784 0
Quantifier num Numeric modifier 95,957 0
poss Possessive modifier 62,489 0
relcl Relative clause 35,371 0
Adverbial
adv Adverbial 156,473 7,736
advcl Adverbial clause 49,503 1,750
advnp Adverbial noun phrase 73,026 480
neg Negation 26,373 1,037
ppmod Preposition phrase 371,927 4,471
Particle
case Case marker 420,045 0
mark Clausal marker 47,286 0
prt Verb particle 13,078 0
Coordination
cc Coordinating conjunction 131,622 0
conj Conjunct 137,128 0
com Compound word 270,326 0
dep Unclassified dependency 39,101 0
disc Discourse element 14,834 0
meta Meta element 19,228 0
Conclusion
22
Consistent
Contributions
Development of graph parsing models.
Logic representation.
Generate a large and diverse corpus of deep dependency graphs.
6M+ tokens
20+ genres
Rich
Future Work
Integration with PropBank.

More Related Content

What's hot

graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
Sihan Chen
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
Andi Wu
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
mghgk
 
Divide and Conquer Semantic Web with Modular
Divide and Conquer Semantic Web with ModularDivide and Conquer Semantic Web with Modular
Divide and Conquer Semantic Web with Modular
Jie Bao
 

What's hot (20)

Parser
ParserParser
Parser
 
Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...
 
Developing AI Tools For A Writing Assistant: Automatic Detection of dt-mistak...
Developing AI Tools For A Writing Assistant: Automatic Detection of dt-mistak...Developing AI Tools For A Writing Assistant: Automatic Detection of dt-mistak...
Developing AI Tools For A Writing Assistant: Automatic Detection of dt-mistak...
 
02voc
02voc02voc
02voc
 
Entity Linking in Queries: Efficiency vs. Effectiveness
Entity Linking in Queries: Efficiency vs. EffectivenessEntity Linking in Queries: Efficiency vs. Effectiveness
Entity Linking in Queries: Efficiency vs. Effectiveness
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
 
Natural Language Processing Semantical and Syntactical Analysis for English
Natural Language Processing Semantical and Syntactical Analysis for EnglishNatural Language Processing Semantical and Syntactical Analysis for English
Natural Language Processing Semantical and Syntactical Analysis for English
 
Types of parsers
Types of parsersTypes of parsers
Types of parsers
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
UWB semeval2016-task5
UWB semeval2016-task5UWB semeval2016-task5
UWB semeval2016-task5
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
 
Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)
 
Lfg and gpsg
Lfg and gpsgLfg and gpsg
Lfg and gpsg
 
Lecture 3: Semantic Role Labelling
Lecture 3: Semantic Role LabellingLecture 3: Semantic Role Labelling
Lecture 3: Semantic Role Labelling
 
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
 
Parekh dfa
Parekh dfaParekh dfa
Parekh dfa
 
L3 v2
L3 v2L3 v2
L3 v2
 
Divide and Conquer Semantic Web with Modular
Divide and Conquer Semantic Web with ModularDivide and Conquer Semantic Web with Modular
Divide and Conquer Semantic Web with Modular
 

Similar to Deep Dependency Graph Conversion in English

Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
Ankit Sharma
 

Similar to Deep Dependency Graph Conversion in English (20)

Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
 
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
Dcap Ja Progmeet 2007 07 05
Dcap Ja Progmeet 2007 07 05Dcap Ja Progmeet 2007 07 05
Dcap Ja Progmeet 2007 07 05
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
 
Essential Biology 3.3 DNA Structure (Core)
Essential Biology 3.3 DNA Structure (Core)Essential Biology 3.3 DNA Structure (Core)
Essential Biology 3.3 DNA Structure (Core)
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Twenty Sentence Patterns--a brief description of patterns 1 through 16a.
Twenty Sentence Patterns--a brief description of patterns 1 through 16a.Twenty Sentence Patterns--a brief description of patterns 1 through 16a.
Twenty Sentence Patterns--a brief description of patterns 1 through 16a.
 
Essential Biology 3.3 & 7.1 DNA Structure (Core & AHL)
Essential Biology 3.3 & 7.1 DNA Structure (Core & AHL)Essential Biology 3.3 & 7.1 DNA Structure (Core & AHL)
Essential Biology 3.3 & 7.1 DNA Structure (Core & AHL)
 
Revealing Entities From Texts With a Hybrid Approach
Revealing Entities From Texts With a Hybrid ApproachRevealing Entities From Texts With a Hybrid Approach
Revealing Entities From Texts With a Hybrid Approach
 
FinalReport
FinalReportFinalReport
FinalReport
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
 
Logic and Reasoning in the Semantic Web (part I –RDF/RDFS)
Logic and Reasoning in the Semantic Web (part I –RDF/RDFS)Logic and Reasoning in the Semantic Web (part I –RDF/RDFS)
Logic and Reasoning in the Semantic Web (part I –RDF/RDFS)
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 

More from Jinho Choi

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Deep Dependency Graph Conversion in English

  • 1. Deep Dependency Graph Conversion in English 15th International Workshop on
 Treebanks and Linguistic Theories January 20th, 2017
 Jinho D. Choi
  • 2. Why Dependency Structure? 2 Many robust and scalable dependency parsers are available. Parser Reference Accuracy Tokens / Sec. Yara Rasooli and Tetreault, 2015 89.32 9,838 Stanford Chen and Manning, 2014 89.59 8,602 spaCy Honnibal et al., 2013 90.86 13,963 NLP4J Choi and McCallum, 2013 91.72 10,271 Comparison between greedy dependency parsers on OntoNotes. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool Jinho D. Choi, Joel Tetreault, and Amanda Stent,ACL, 2015. State-of-the-art achieved by a non-greedy parser: 92.50. Parsing the entire English Wikipedia in 60 hours.
  • 3. Why Dependency Structure? 3 It is considered “more” universal. http://universaldependencies.org 48+ languages 391K tokens
  • 4. Why Dependency Conversion? 4 Most treebanks are annotated with constituency trees in English. Treebank Trees Tokens OntoNotes 138,566 2,620,495 BOLT 78,734 949,300 English Web 16,622 254,830 QuestionBank 4,000 38,188 THYME 88,893 936,166 SHARP 50,725 499,834 CRAFT 21,710 561,017 MIPACQ 19,141 269,178 GENIA 18,541 Total 436,932 6,129,008 vs. 391K tokens in UD Covers 20+ genres
  • 5. Towards Deep Structure 5 Most dependency parsing approaches have focused on tree parsing. CoNLL 2006, 2007, 2008, 2009, 2017 shared tasks. Allows to develop efficient parsing models PROS Cannot represent the complete relations CONS PropBank Abstract Meaning Representation Semantic Dependency Parsing http://sdp.delph-in.net http://verbs.colorado.edu/propbank/ http://amr.isi.edu http://nlp.cs.nyu.edu/meyers/NomBank.html NomBank
  • 6. Towards Deep Structure 6 PropBank Abstract Meaning Representation Semantic Dependency Parsing NomBank Focused on
 verbal predicates Focused on
 nominal predicates Added nominal and adjectival predicates Semantically oriented Penn Treebank Syntactically
 oriented Do not necessarily agree with Treebank Relatively small Limited to one genre
  • 7. Deep Dependency Graph 7 Consistent representations regardless of syntactic variations. Rich predicate argument structures. A large number of deep dependency graphs in multiple genres. Objectives Dative Expletive Passive Coordination Arguments Small Clause Open Clause Relative Clause Predicates Secondary LightVerb Auxiliaries Modal RaisingVerb
  • 8. Secondary Predicate 8 Universal Dependency Secondary predicate → function tag PRD. vs.
  • 10. LightVerb Construction 10 Light verbs = {make, take, have, do, give, keep} Eventive nouns are collected from PropBank. Universal Dependency
  • 11. Dative 11 Dative → indirect object, DTV or BNF.
  • 12. Expletive 12 Expletive → existential “there” or pleonastic “it”. vs. vs. Deep Dependency Universal Dependency part-of-speech EX
  • 13. Expletive 13 Expletive → existential “there” or pleonastic “it”. Universal DependencyDeep Dependency empty category *EXP*
  • 14. Passive Construction 14 ç Heuristics for LINK-PSV in PropBank Secondary Dependency
  • 15. Coordination 15 Arguments in coordination are
 explicitly represented in constituency trees.
  • 16. Small Clause 16 Deep Dependency Universal Dependency Deep Dependency Universal Dependency ç ç ç ç Small clause → S consisting of only SBJ and PRD.
  • 17. Open Clause 17 Open clause → a clause without an internal subject. ç ç ç ç
  • 18. Relative Clause 18 Open clause → empty category *T*. Heuristics for LINK-SLC in PropBank ?
  • 19. Modal Adjective 19 These arguments can be identified in constituency trees; they are either the siblings of the coordinated verbs (e.g., the book and last year in Figure 11a) or the siblings of the verb phrases that are the ancestors of these verbs (e.g., John in Figure 11a). When the coordination is not on the same level, right node raising is used, which can be identified by the empty category *RNR*-d. In Figure 11b, John is coordinated across the verb phrase including value and the preposition phrase including for. Unlike the coordinated verbs in Figure 11a that are siblings, these are not siblings so need to be coordinated through right node raising. The coordinated arguments are represented by the secondary dependencies to avoid multiple heads. 3.3 Auxiliaries Modal adjective Modal adjectives are connected with the class of modal verbs such as can, may, or should that are used with non-modal verbs to express possibility, permission, intention, etc: able 915 ready 105 prepared 32 due 24 glad 21 likely 235 happy 69 eager 30 sure 24 unwilling 20 willing 173 about 49 free 30 determined 22 busy 18 unable 165 reluctant 44 unlikely 28 afraid 22 qualified 16 Table 1: Top-20 modal adjectives and their counts from the corpora in Table 3. 4https://github.com/emorynlp/nlp4j An adjectival predicate including an open clause
 whose external subject is the subject of the adjectival predicate.
  • 20. RaisingVerb 20 Raising verb → empty category *-d to SBJ. Raising verb Distinguished from most of the previous work, raising verbs modify the “raised” verbs in DDG. A verb is considered a raising verb if 1)it is followed by a clause whose subject is the empty category *-d, and 2)the antecedent of the empty category is the subject of the raise verb. In Figure 13, the raising verbs go have, and keep are followed by the clauses whose subjects are the empty categories *-1, *-2, and *-3, which all link to the same subject as the raised verb, study. have 1,846 begin 825 stop 379 keep 158 prove 89 go 1,461 seem 787 be 322 use 157 turn 67 continue 1,210 appear 714 fail 233 get 136 happen 38 need 1,038 start 546 tend 168 ought 91 expect 38 Table 2: Top-20 raising verbs and their counts from the corpora in Table 3. 3.4 Semantic Roles As shown in Figure 9a, semantic roles are extracted from certain function tags and added to the terminal heads of the phrases that include such function tags. The function tags used to extract semantic roles are: DIR: directional, EXT: extent, LOC locative, MNR: manner, PRP: purpose, and TMP: temporal. 4 Analysis 4.1 Corpora Six corpora that consist of the Penn Treebank style constituency trees are used to generate deep dependency graphs: OntoNotes (Weischedel et al. [38]), the English Web Treebank (Web; Petrov and McDonald [35]), QuestionBank (Judge et al. [23]) and the MiPACQ|Sharp|Thyme corpora (Albright et al. [2]). All together, these corpora cover 20 different genres including formal, colloquial, conversational, and clinical documents, providing enough diversities to our dependency representation
  • 21. Deep Dependency Labels 21 Type Label Description Primary Secondary Subject csbj Clausal subject 5,291 123 expl Expletive 10,808 0 nsbj Nominal subject 298,418 71,383 Object comp Clausal complement 86,884 105 dat Dative 6,763 87 obj (Direct or preposition) object 205,149 20,785 Auxiliary aux Auxiliary verb 148,829 0 cop Copula 81,661 0 lv Light verb 7,655 0 modal Modal (verb or adjective) 49,259 0 raise Raising verb 10,598 0 acl Clausal modifier of nominal 24,791 7 appo Apposition 32,460 17 Nominal attr Attribute 352,939 14 and det Determiner 334,784 0 Quantifier num Numeric modifier 95,957 0 poss Possessive modifier 62,489 0 relcl Relative clause 35,371 0 Adverbial adv Adverbial 156,473 7,736 advcl Adverbial clause 49,503 1,750 advnp Adverbial noun phrase 73,026 480 neg Negation 26,373 1,037 ppmod Preposition phrase 371,927 4,471 Particle case Case marker 420,045 0 mark Clausal marker 47,286 0 prt Verb particle 13,078 0 Coordination cc Coordinating conjunction 131,622 0 conj Conjunct 137,128 0 com Compound word 270,326 0 dep Unclassified dependency 39,101 0 disc Discourse element 14,834 0 meta Meta element 19,228 0
  • 22. Conclusion 22 Consistent Contributions Development of graph parsing models. Logic representation. Generate a large and diverse corpus of deep dependency graphs. 6M+ tokens 20+ genres Rich Future Work Integration with PropBank.