Submit Search
Upload
Universal Dependencies For Inference
•
1 like
•
340 views
Valeria de Paiva
Follow
Talk at Nuance nuSVN seminar in Feb 2017
Read less
Read more
Education
Report
Share
Report
Share
1 of 31
Download now
Download to read offline
Recommended
Natural Language Inference in SICK
Natural Language Inference in SICK
Valeria de Paiva
Natural Language Inference: Logic from Humans
Natural Language Inference: Logic from Humans
Valeria de Paiva
Natural Language Processing
Natural Language Processing
Gabe Wilberscheid
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
Vincenzo Lomonaco
French machine reading for question answering
French machine reading for question answering
Ali Kabbadj
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
Universidad Nacional de San Martin
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
cscpconf
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
csandit
Recommended
Natural Language Inference in SICK
Natural Language Inference in SICK
Valeria de Paiva
Natural Language Inference: Logic from Humans
Natural Language Inference: Logic from Humans
Valeria de Paiva
Natural Language Processing
Natural Language Processing
Gabe Wilberscheid
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
Vincenzo Lomonaco
French machine reading for question answering
French machine reading for question answering
Ali Kabbadj
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
Universidad Nacional de San Martin
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
cscpconf
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
csandit
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
INESC-ID (Spoken Language Systems Laboratory - L2F)
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
Andre Freitas
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Vijay Prakash Dwivedi
OBSC Framework
OBSC Framework
Mouhamad Kawas
Challenges in transfer learning in nlp
Challenges in transfer learning in nlp
LaraOlmosCamarena
Agile in Context: Cynefin Framework,Three-circle model &the future of Agile
Agile in Context: Cynefin Framework,Three-circle model &the future of Agile
Daniel Walsh
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
CA API Management
Optimizing Near-Synonym System
Optimizing Near-Synonym System
Siyuan Zhou
Text Mining for Lexicography
Text Mining for Lexicography
Leiden University
NLP
NLP
Mohamed El-Serngawy
NLP
NLP
guestff64339
Semantic Hand-Tagging of the SenSem Corpus Using Spanish WordNet Senses
Semantic Hand-Tagging of the SenSem Corpus Using Spanish WordNet Senses
pathsproject
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Anubhav Jain
Using construction grammar in conversational systems
Using construction grammar in conversational systems
CJ Jenkins
Tutorial on word2vec
Tutorial on word2vec
Leiden University
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
An introduction to compositional models in distributional semantics
An introduction to compositional models in distributional semantics
Andre Freitas
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
devangmittal4
Dialectica Comonoids
Dialectica Comonoids
Valeria de Paiva
Dialectica Categorical Constructions
Dialectica Categorical Constructions
Valeria de Paiva
More Related Content
Similar to Universal Dependencies For Inference
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
INESC-ID (Spoken Language Systems Laboratory - L2F)
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
Andre Freitas
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Vijay Prakash Dwivedi
OBSC Framework
OBSC Framework
Mouhamad Kawas
Challenges in transfer learning in nlp
Challenges in transfer learning in nlp
LaraOlmosCamarena
Agile in Context: Cynefin Framework,Three-circle model &the future of Agile
Agile in Context: Cynefin Framework,Three-circle model &the future of Agile
Daniel Walsh
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
CA API Management
Optimizing Near-Synonym System
Optimizing Near-Synonym System
Siyuan Zhou
Text Mining for Lexicography
Text Mining for Lexicography
Leiden University
NLP
NLP
Mohamed El-Serngawy
NLP
NLP
guestff64339
Semantic Hand-Tagging of the SenSem Corpus Using Spanish WordNet Senses
Semantic Hand-Tagging of the SenSem Corpus Using Spanish WordNet Senses
pathsproject
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Anubhav Jain
Using construction grammar in conversational systems
Using construction grammar in conversational systems
CJ Jenkins
Tutorial on word2vec
Tutorial on word2vec
Leiden University
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
An introduction to compositional models in distributional semantics
An introduction to compositional models in distributional semantics
Andre Freitas
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
devangmittal4
Similar to Universal Dependencies For Inference
(20)
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
OBSC Framework
OBSC Framework
Challenges in transfer learning in nlp
Challenges in transfer learning in nlp
Agile in Context: Cynefin Framework,Three-circle model &the future of Agile
Agile in Context: Cynefin Framework,Three-circle model &the future of Agile
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
Optimizing Near-Synonym System
Optimizing Near-Synonym System
Text Mining for Lexicography
Text Mining for Lexicography
NLP
NLP
NLP
NLP
Semantic Hand-Tagging of the SenSem Corpus Using Spanish WordNet Senses
Semantic Hand-Tagging of the SenSem Corpus Using Spanish WordNet Senses
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Using construction grammar in conversational systems
Using construction grammar in conversational systems
Tutorial on word2vec
Tutorial on word2vec
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
An introduction to compositional models in distributional semantics
An introduction to compositional models in distributional semantics
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
More from Valeria de Paiva
Dialectica Comonoids
Dialectica Comonoids
Valeria de Paiva
Dialectica Categorical Constructions
Dialectica Categorical Constructions
Valeria de Paiva
Logic & Representation 2021
Logic & Representation 2021
Valeria de Paiva
Constructive Modal and Linear Logics
Constructive Modal and Linear Logics
Valeria de Paiva
Dialectica Categories Revisited
Dialectica Categories Revisited
Valeria de Paiva
PLN para Tod@s
PLN para Tod@s
Valeria de Paiva
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better Science
Valeria de Paiva
Going Without: a modality and its role
Going Without: a modality and its role
Valeria de Paiva
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-Veloso
Valeria de Paiva
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and Machines
Valeria de Paiva
Dialectica Petri Nets
Dialectica Petri Nets
Valeria de Paiva
The importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in Portuguese
Valeria de Paiva
Negation in the Ecumenical System
Negation in the Ecumenical System
Valeria de Paiva
Constructive Modal and Linear Logics
Constructive Modal and Linear Logics
Valeria de Paiva
Semantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACT
Valeria de Paiva
NLCS 2013 opening slides
NLCS 2013 opening slides
Valeria de Paiva
Dialectica Comonads
Dialectica Comonads
Valeria de Paiva
Categorical Explicit Substitutions
Categorical Explicit Substitutions
Valeria de Paiva
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for Dialog
Valeria de Paiva
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Valeria de Paiva
More from Valeria de Paiva
(20)
Dialectica Comonoids
Dialectica Comonoids
Dialectica Categorical Constructions
Dialectica Categorical Constructions
Logic & Representation 2021
Logic & Representation 2021
Constructive Modal and Linear Logics
Constructive Modal and Linear Logics
Dialectica Categories Revisited
Dialectica Categories Revisited
PLN para Tod@s
PLN para Tod@s
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better Science
Going Without: a modality and its role
Going Without: a modality and its role
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-Veloso
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and Machines
Dialectica Petri Nets
Dialectica Petri Nets
The importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in Portuguese
Negation in the Ecumenical System
Negation in the Ecumenical System
Constructive Modal and Linear Logics
Constructive Modal and Linear Logics
Semantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACT
NLCS 2013 opening slides
NLCS 2013 opening slides
Dialectica Comonads
Dialectica Comonads
Categorical Explicit Substitutions
Categorical Explicit Substitutions
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for Dialog
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Recently uploaded
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
Celine George
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
iammrhaywood
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
sanyamsingh5019
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
Maestría en Comunicación Digital Interactiva - UNR
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
christianmathematics
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
pragatimahajan3
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
Association for Project Management
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
PsychoTech Services
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
Jayanti Pande
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
RaunakKeshri1
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
GeoBlogs
microwave assisted reaction. General introduction
microwave assisted reaction. General introduction
Maksud Ahmed
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
misteraugie
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
EduSkills OECD
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
Thiyagu K
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
Admir Softic
Recently uploaded
(20)
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
microwave assisted reaction. General introduction
microwave assisted reaction. General introduction
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
Universal Dependencies For Inference
1.
© 2017 Nuance
Communications, Inc. All rights reserved. Universal Dependencies for Inference Valeria de Paiva AI & NLU Lab, Sunnyvale (joint work with A. Rademaker and F. Chalub, IBM Research, Brazil) February, 2017
2.
© 2017 Nuance
Communications, Inc. All rights reserved. 2 Outline • The corpus SICK • Universal Dependencies • Processing SICK with FreeLing & Parsey • Analysis • What’s Next?
3.
© 2017 Nuance
Communications, Inc. All rights reserved. 3 • Stands for Sentences Involved in Compositional Knowledge, • Result of 5-year European project COMPOSES(2011-2016) • SICK data set consists of about 10,000 English sentence pairs, from captions of images • original sentences were normalized to remove unwanted linguistic phenomena; sentences were expanded to obtain up to three new suitable for evaluation; all the sentences were paired to obtain the final data set. • Each sentence pair was annotated for (relatedness and) entailment by means of crowdsourcing techniques. • entailment annotation led to 5595 neutral pairs, 1424 contradiction pairs, and 2821 entailment pairs • All available from http://clic.cimec.unitn.it/composes/ sick.html The Corpus SICK
4.
© 2017 Nuance
Communications, Inc. All rights reserved. 4 – Simple, common sense like sentences: People are walking or One man in a big city is holding up a sign and begging for money. – After de-duplication of sentences we have 6076 sentences, with 512 unique verb lemmas, 269 unique adjectives, 148 unique adverbs and 1076 unique nouns. – Created from captions of pictures, SICK is about daily activities and entities; ones you can see in pictures taken by real people and which require common sense concepts: people, cats, dogs, running, climbing, eating, etc. The SICK corpus
5.
© 2017 Nuance
Communications, Inc. All rights reserved. 5 – Full details, code and data included, can be found in our GitHub repository https://github.com/own-pt/rte-sick – We run the sentences through FreeLing and use its tokenization to match the UD dependencies, obtained using Parsey McFace. – From the CoNLL representation of each sentence we obtain its bag-of- concepts using the Princeton WordNet (PWN) mappings provided by FreeLing. – We use FreeLing’s Word Sense Disambiguation (WSD) -- based on Aguirre’s UKB state-of-the-art Personalized PageRank algo -- to pick a favorite sense. – We use PWN to SUMO mappings to obtain logical concepts. Processing the sentences Parsey McFace, FreeLing
6.
© 2017 Nuance
Communications, Inc. All rights reserved. 6 Available from http://nlp.cs.upc.edu/freeling/ FreeLing – Established project, led by Lluís Padró, UPC (Universitat Politècnica de Catalunya), since 2004. latest release FreeLing 4.0, May 2016 – FreeLing is a C++ library providing language analysis functionalities: – morphological analysis, named entity detection, PoS-tagging, parsing, Word Sense Disambiguation, Semantic Role Labelling, etc. – for a variety of languages: English, Spanish, Portuguese, Italian, French, German, Russian, Catalan, Galician, Croatian, Slovene, etc. – We have been using FreeLing since 2014 A multilingual, open source language analysis tool suite
7.
© 2017 Nuance
Communications, Inc. All rights reserved. 7 From https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds- most.html SyntaxNet – Project, led by Slav Petrov, Google Research, first release May 2016 – SyntaxNet, is an open-source neural network framework implemented in TensorFlow that provides a foundation for Natural Language Understanding (NLU) systems. – Parsey McParseface is an English parser trained by SyntaxNet and used to analyze English text, producing Stanford dependencies. Given a sentence as input, it tags each word with a part-of-speech (POS) tag that describes the word's syntactic function, and it determines the syntactic relationships between words in the sentence, represented in the dependency parse tree. – SyntaxNet applies neural networks to the ambiguity problem using beam search in the model. A multilingual, open source language analysis tool suite, based on NNs
8.
© 2017 Nuance
Communications, Inc. All rights reserved. 8 Globally Normalized Transition-Based Neural Networks is their paper Parsey McParseface – Given some data from the Google supported Universal Dependencies project, you can train a parsing model on your own machine, they say. – On a standard benchmark (the 20 year old Penn Treebank), Parsey McParseface achieves 94% accuracy, beating their own previous state-of- the-art results. On Web text over 90% parse accuracy. (linguists trained for this task agree in 96-97% of the cases) – Coupled this with Manning’s famous 97% accuracy on pos-tagging… – But: Alice drove down the street in her car The English face of SyntaxNet
9.
© 2017 Nuance
Communications, Inc. All rights reserved. 9 “our work is still cut out for us: we would like to develop methods that can learn world knowledge and enable equal understanding of natural language across all languages and contexts” Slav Petrov Research Engineer, Google Research
10.
© 2017 Nuance
Communications, Inc. All rights reserved. 10 Processing of SICK – Want to do inference with SICK sentences – Want sentences into 3 buckets: – Prop, Prop&Prop, notProp – Not so easy to bucket them.. – Need to find all negations, need to find all conjunctions of propositions – Examples: A woman is rock climbing , pausing and calculating the route – vs – A man and a woman are walking through the woods 6076 sentences, but only 1814 unique lemmas, yay!
11.
© 2017 Nuance
Communications, Inc. All rights reserved. 11 Processing of SICK What the CoNLL representations look like? Similar to this 6076 sentences, but only 1814 unique lemmas, yay!
12.
© 2017 Nuance
Communications, Inc. All rights reserved. 12 Processing of SICK – SICK sentences were “expanded” from a core set of sentences in visual corpora, via human constructed transformations such as adding passive or active voice, adding negations, adding adjectival modifiers, etc. – Idea was to simplify the linguistic structure, and to create comparisons of different linguistic phenomena. – They say: [caption corpora] contain sentences (as opposed to paragraphs) that describe the same picture or video and are thus near paraphrases, are lean on named entities and rich in generic terms. – They call this process “normalization”, they provide us with 11 normalization rules. – Examples next slide How did they construct their corpus?
13.
© 2017 Nuance
Communications, Inc. All rights reserved. 13 Processing of SICK – SICK sentences were “expanded” from a core set of sentences in visual corpora, via human constructed transformations such as adding passive or active voice, adding negations, adding adjectival modifiers, etc. – Idea was to simplify the linguistic structure, and to create comparisons of different linguistic phenomena. – They say: [caption corpora] contain sentences (as opposed to paragraphs) that describe the same picture or video and are thus near paraphrases, are lean on named entities and rich in generic terms. – They call this process “normalization”, they provide us with 11 normalization rules. – Examples next slide How did they construct their corpus? (their LREC2014 paper)
14.
© 2017 Nuance
Communications, Inc. All rights reserved. 14 Processing of SICK – SICK sentences were “expanded” from a core set of sentences in visual corpora, via human constructed transformations such as adding passive or active voice, adding negations, adding adjectival modifiers, etc. – Idea was to simplify the linguistic structure, and to create comparisons of different linguistic phenomena. – They say: [caption corpora] contain sentences (as opposed to paragraphs) that describe the same picture or video and are thus near paraphrases, are lean on named entities and rich in generic terms. – They call this process “normalization”, they provide us with 11 normalization rules. – Examples next slide How did they construct their corpus? More “normalization” rules
15.
© 2017 Nuance
Communications, Inc. All rights reserved. 15 Processing of SICK 1. There are still non-sentences like “A couple standing on the curb”, rule #10. 2. There are 41 possessive dependencies (genitives like An animal is biting somebody 's hand) and 341 poss ones (for possessive pronouns) as in A man is holding a mask in *his* raised hand, rule #1. 3. There are still a few NE’s in the corpus (e.g “Seadoo”, a kind of jet ski) and ATVs (all terrain vehicles) rule #2 4. There are a few non “present continuous sentences” like (The speaker likes parrots) rule #3 5. Rule #4 ``Replace complex verb constructions into simpler ones” is difficult to judge. But examples similar to the one given (A man is attempting to surf down a hill made of sand è A man is surfing down a hill made of sand) happen in the corpus several times. We have 4 of ``trying to catch” and 37 xcomp dependency relations all together. 6. Rule #5 (Simplify verb phrases with modals and auxiliaries) is hard to check as modals are not marked in the dependencies, as far as I can see. 7. Rule #6 (Replace phrasal verbs with a synonym if verb and preposition are not adjacent) only helps a little. There are 324 particles, most of them corresponding to particle verbs that change meaning (e.g. A toddler is standing up ) gets one concept for standing and one for up. 8. Rule #7(Remove multiword expressions), we have 1148 noun-noun dependency relations and many are mwes. 9. Rule #8 (Remove dates and numbers) worked fairly well, we only have one temporal modifier (a sunny day) but 748 number dependencies and a need to tie this up to real numbers 10. Rule #9 (Turn subordinates into coordinates) also seems to have worked ok. 11. Rule #11(Remove indirect interrogative and parenthetical phrases). Well there are at least 4 real appositives, e.g Four middle eastern children , three girls and one boy , are climbing on the grotto with a pink interior. The construction rules didn’t work too well…
16.
© 2017 Nuance
Communications, Inc. All rights reserved. 16 Results of SICK – The curators of the corpus also made an effort to reduce the amount of “encyclopedic knowledge” about the world that is needed. – They say “To ensure the quality of the data set, all the sentences were checked for grammatical or lexical mistakes and disfluencies by a native English speaker.” – Reasonably well, we expect? – But the negations tell a different story. All (or almost all) the expletives (539 of them) are negative, but negation is not marked. This wouldn’t be a problem, if these sentences made sense. Many do not. How well did they do the job of simplifying the language?
17.
© 2017 Nuance
Communications, Inc. All rights reserved. 17 Processing of SICK Meaning-preserving Transformations (from LREC2014 paper again)
18.
© 2017 Nuance
Communications, Inc. All rights reserved. 18 Processing of SICK Meaning-altering Transformations
19.
© 2017 Nuance
Communications, Inc. All rights reserved. 19 Results of SICK According to stats we have 436 neg dependency relations. They really seem to correspond to negation using the adverb “not”. Clearly we are missing many negations (most if not all the 539 expletives are negative), but the determiners ‘no’ are not marked as negative. This we can correct semi-automatically. But there are many atrocious sentences that do not make sense at all: A person is ignoring the motocross bike that is lying on its side and there is no one is racing by; (maybe the third ``is” is a typo?) or There is no man wearing clothes that are covered with paint or is sitting outside in a busy area writing something. How can we measure how many bad sentences there are? We decided to investigate a reduced sub-corpus. Negation in SICK-PARSEY
20.
© 2017 Nuance
Communications, Inc. All rights reserved. 20 The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. 3-4-5: a hand-checked sub-corpus – Only one conjunction: Paper and scissors both cut – 13 copulas (4 wrong of them are wrong: The man is training, The man is rock climbing, A woman is grating carrots, The woman is dicing garlic ) – 19 negations marked, but 26 neg expletives, 10 nobodies, 1 no person è should be 56 negations, needs work, we know. – Compounds? Have 20 nns, some are real compounds: – ping pong, golden retriever, sumo wrestlers, tiger cub, sumo ringers, baby pandas, cartoon airplane – Some are processing mistakes, gerund for the noun, lots (14 in 390) hamster singing, lion walking, panda climbing, etc. – Also 9 cases of particle verbs (need to update PWN?) 390 sentences (385 nsubj+ 5 nsubjpass) Problems pale in comparison to WSD
21.
© 2017 Nuance
Communications, Inc. All rights reserved. 21 – Mapping has many issues – QAD (quick and dirty, Tetrault&Lee) – Rewriting of CoNLL is necessary for most sentences, future work… Princeton WN to SUMO mapping – SUMO is intended as a foundation ontology for a variety of information processing systems. – First released in December 2000, it uses a version of the language SUO-KIF (first-order logic with bells&whistles) in a LISP-like syntax. – A mapping from WordNet synsets to SUMO which makes it special. – Many domain ontologies to extend it – “the largest formal public ontology in existence today.” Why SUMO? Language is important SUMO is open source, upper ontology
22.
© 2017 Nuance
Communications, Inc. All rights reserved. 22 Princeton WN-SUMO mapping text = People are walking 1 People people NOUN NNS_ 3 nsubj _NNS|07942152-n GroupOfPeople= 2 are be VERB VBP _ 3 aux _ VBP|02604760-v|Entity+ 3 walking walk VERB VBG_ 0 ROOT _ VBG|01904930-v|Walking= Kinds of mapping: 1. Equality: the synset corresponds exactly to the concept (as in Walking= above) 2. Containment: Concept+, the synset is a subconcept of the SUMO concept 3. Negation: Concept[ (the negation of the concept) silent= not communicating 4. A-box: shouldn’t be happening here, but it does. ok for Asian, German? Why SUMO? SUMO’s mappings: the theory
23.
© 2017 Nuance
Communications, Inc. All rights reserved. 23 Examples text = Men are sawing 1 Men man NOUN NNS_ 3 nsubj _ NNS|02472293-n|Hominid= 2 are be VERB VBP _ 3 aux _ VBP|02604760-v|Entity+ 3 sawing saw VERB VBG_ 0 ROOT _ VBG|01559590-v|Cutting+ “saw” is a very specific verb and has only one synset in PWN: 01559590-v saw | (cut with a saw; "saw wood for the fireplace") But SUMO has no exact match for this concept, so it maps to Cutting+, a kind of Cutting, the one done with a “saw”. Much worse is the situation with “Man” that occurs in 33 PWN synsets, the WSD alg decided for Hominid=, but Man= (equivalent mapping) would be much better. Containment
24.
© 2017 Nuance
Communications, Inc. All rights reserved. 24 Examples text = Some people are silent 1 Some some DET DT _ 2 det _ DT|?|? 2 people people NOUN NNS_ 4 nsubj _NNS|07942152- nGroupOfPeople= 3 are be VERB VBP _ 4 cop _ VBP|02604760-v|Entity+ 4 silent silent ADJ JJ _ 0 ROOT _ JJ|00942163-a| LinguisticCommunication[ “silent” is an adjective in 6 synsets, mapped to LinguisticCommunication[, which is to be read as non-LinguisticCommunication. This seems the wrong WSD, but the correct one would give us Communication[ (negated subsuming mapping). Negation
25.
© 2017 Nuance
Communications, Inc. All rights reserved. 25 Examples 1 An a DET DT _ 3 det _ DT|?|? 2 American american ADJ JJ _ 3 amod_ JJ|02927303-a|LandArea@ 3 footballer footballer NOUN NN _ 5 nsubj _ NN|10101634-n| ProfessionalAthlete+ 4 is be VERB VBZ _ 5 aux _ VBZ|02604760-v|Entity+ 5 wearing wear VERB VBG _ 0 ROOT _ VBG|00075021-v| RecreationOrExercise+ 6 the the DET DT _ 10 det _ DT|?|? 7 red red ADJ JJ _ 10 amod_ JJ|00381097-a|Red= 8 and and CONJ CC _ 7 cc _ CC|?|? 9 white white ADJ JJ _ 7 conj _ JJ|00393105-a|White= 10 strip strip NOUN NN _ 5 dobj _ NN|09449510-n|SelfConnectedObject+ The mapping of ‘American’ to related to the continent America is a reasonable hook for future work A-box
26.
© 2017 Nuance
Communications, Inc. All rights reserved. 26 Analysis The numbers Total SICK mappings: 38671 ? Most need to be discarded 29606 + most need to be made specific 16477 = 171 @ 67 [ Need to deal with negated, a-box concepts Need more bodily process (sleeping..), Need more baby-zoology (tiger, kitten not the same) But the killer is WSD
27.
© 2017 Nuance
Communications, Inc. All rights reserved. 27 Conclusion QAD with FreeLing WSD does NOT work Could repeat Petrov’s words. Could make a more careful corpus, checking its processing. Could rethink WSD. Have done so and included all the PWN senses, for the time being. It’s less wrong!! Considering how to make sure that ontologies pay for their upkeeping, without throwing away our Neural Networks gains.
28.
© 2017 Nuance
Communications, Inc. All rights reserved. 28 Future Work Enhancing dependencies with friends Want to use what we’ve learnt during the summer with the presentations of Schuster, Lewis, Reddy and Stanovsky.They all had rules that applied to dependencies to make them more `semantic’. We also have the AKR rules and the matcher rules to use and improve. At the moment, working on Stanovsky’s (and Fauke’s rules, they work for EN and DE) with Prateek Jain. Also working on the Entailment pairs of SICK. (note that simply concepts can give useful info2)
29.
© 2017 Nuance
Communications, Inc. All rights reserved. 29 CODA We are the UD-Portuguese now! There were two totally automatic treebanks in Portuguese in UDs. Zeman’s version started from Linguateca’s Bosque and did conversions. Google’s UD-BR used web data and no manual input. Started re-annotating Bosque in Oct 2016. We’re now the official UD-PT, as we automatically converted and manually checked (or are checking) Bosque. Paper submitted.
30.
© 2017 Nuance
Communications, Inc. All rights reserved. Thank you
31.
© 2017 Nuance
Communications, Inc. All rights reserved. 31
Download now