Universal Dependencies For Inference

© 2017 Nuance Communications, Inc. All rights reserved.
Universal Dependencies
for Inference
Valeria de Paiva AI & NLU Lab, Sunnyvale
(joint work with A. Rademaker and F. Chalub, IBM Research, Brazil)
February, 2017

© 2017 Nuance Communications, Inc. All rights reserved. 2
Outline
•  The corpus SICK
•  Universal Dependencies
•  Processing SICK with
FreeLing & Parsey
•  Analysis
•  What’s Next?

•  Stands for Sentences Involved in Compositional
Knowledge,
•  Result of 5-year European project COMPOSES(2011-2016)
•  SICK data set consists of about 10,000 English sentence
pairs, from captions of images
•  original sentences were normalized to remove unwanted
linguistic phenomena; sentences were expanded to obtain
up to three new suitable for evaluation; all the sentences
were paired to obtain the final data set.
•  Each sentence pair was annotated for (relatedness and)
entailment by means of crowdsourcing techniques.
•  entailment annotation led to 5595 neutral pairs, 1424
contradiction pairs, and 2821 entailment pairs
•  All available from http://clic.cimec.unitn.it/composes/
sick.html
The Corpus SICK

–  Simple, common sense like sentences: People are walking
or One man in a big city is holding up a sign and begging for
money.
–  After de-duplication of sentences we have 6076 sentences,
with 512 unique verb lemmas, 269 unique adjectives, 148
unique adverbs and 1076 unique nouns.
–  Created from captions of pictures, SICK is about daily
activities and entities; ones you can see in pictures taken by
real people and which require common sense concepts:
people, cats, dogs, running, climbing, eating, etc.
The SICK corpus

–  Full details, code and data included, can be found in our GitHub repository
https://github.com/own-pt/rte-sick
–  We run the sentences through FreeLing and use its tokenization to match the
UD dependencies, obtained using Parsey McFace.
–  From the CoNLL representation of each sentence we obtain its bag-of-
concepts using the Princeton WordNet (PWN) mappings provided by
FreeLing.
–  We use FreeLing’s Word Sense Disambiguation (WSD) -- based on Aguirre’s
UKB state-of-the-art Personalized PageRank algo -- to pick a favorite sense.
–  We use PWN to SUMO mappings to obtain logical concepts.
Processing the sentences
Parsey McFace, FreeLing

Available from http://nlp.cs.upc.edu/freeling/
FreeLing
–  Established project, led by Lluís Padró, UPC (Universitat Politècnica de Catalunya),
since 2004. latest release FreeLing 4.0, May 2016
–  FreeLing is a C++ library providing language analysis functionalities:
–  morphological analysis, named entity detection, PoS-tagging, parsing, Word Sense
Disambiguation, Semantic Role Labelling, etc.
–  for a variety of languages: English, Spanish, Portuguese, Italian, French, German,
Russian, Catalan, Galician, Croatian, Slovene, etc.
–  We have been using FreeLing since 2014
A multilingual, open source language analysis tool suite

From https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-
most.html
SyntaxNet
–  Project, led by Slav Petrov, Google Research, first release May 2016
–  SyntaxNet, is an open-source neural network framework implemented in TensorFlow
that provides a foundation for Natural Language Understanding (NLU) systems.
–  Parsey McParseface is an English parser trained by SyntaxNet and used to analyze
English text, producing Stanford dependencies. Given a sentence as input, it tags
each word with a part-of-speech (POS) tag that describes the word's syntactic
function, and it determines the syntactic relationships between words in the
sentence, represented in the dependency parse tree.
–  SyntaxNet applies neural networks to the ambiguity problem using beam search in
the model.
A multilingual, open source language analysis tool suite, based on
NNs

Globally Normalized Transition-Based Neural Networks is their paper
Parsey McParseface
–  Given some data from the Google supported Universal Dependencies project, you
can train a parsing model on your own machine, they say.
–  On a standard benchmark (the 20 year old Penn Treebank),
Parsey McParseface achieves 94% accuracy, beating their own previous state-of-
the-art results. On Web text over 90% parse accuracy.
(linguists trained for this task agree in 96-97% of the cases)
–  Coupled this with Manning’s famous 97% accuracy on pos-tagging…
–  But: Alice drove down the street in her car
The English face of SyntaxNet

“our work is still cut out for us: we
would like to develop methods that
can learn world knowledge and
enable equal understanding of
natural language across all
languages and contexts”
Slav Petrov
Research Engineer, Google Research

Processing of SICK
–  Want to do inference with SICK sentences
–  Want sentences into 3 buckets:
–  Prop, Prop&Prop, notProp
–  Not so easy to bucket them..
–  Need to find all negations, need to find all conjunctions
of propositions
–  Examples: A woman is rock climbing , pausing and
calculating the route
–  vs
–  A man and a woman are walking through the woods
6076 sentences, but only 1814 unique lemmas, yay!

Processing of SICK
What the CoNLL representations look like? Similar to this
6076 sentences, but only 1814 unique lemmas, yay!

Processing of SICK
–  SICK sentences were “expanded” from a core set of sentences in visual corpora, via
human constructed transformations such as adding passive or active voice, adding
negations, adding adjectival modifiers, etc.
–  Idea was to simplify the linguistic structure, and to create comparisons of different
linguistic phenomena.
–  They say: [caption corpora] contain sentences (as opposed to paragraphs) that
describe the same picture or video and are thus near paraphrases, are lean on
named entities and rich in generic terms.
–  They call this process “normalization”, they provide us with 11 normalization rules.
–  Examples next slide
How did they construct their corpus?

Processing of SICK
How did they construct their corpus? (their LREC2014 paper)

Processing of SICK
How did they construct their corpus? More “normalization” rules

Processing of SICK
1.  There are still non-sentences like “A couple standing on the curb”, rule #10.
2.  There are 41 possessive dependencies (genitives like An animal is biting somebody 's hand) and 341 poss ones
(for possessive pronouns) as in A man is holding a mask in *his* raised hand, rule #1.
3. There are still a few NE’s in the corpus (e.g “Seadoo”, a kind of jet ski) and ATVs (all terrain vehicles) rule #2
4. There are a few non “present continuous sentences” like (The speaker likes parrots) rule #3
5. Rule #4 ``Replace complex verb constructions into simpler ones” is difficult to judge. But examples similar to the one given
(A man is attempting to surf down a hill made of sand è A man is surfing down a hill made of sand) happen in the corpus
several times. We have 4 of ``trying to catch” and 37 xcomp dependency relations all together.
6. Rule #5 (Simplify verb phrases with modals and auxiliaries) is hard to check as modals are not marked in the dependencies,
as far as I can see.
7. Rule #6 (Replace phrasal verbs with a synonym if verb and preposition are not adjacent) only helps a little. There are 324
particles, most of them corresponding to particle verbs that change meaning (e.g. A toddler is standing up ) gets one concept
for standing and one for up.
8. Rule #7(Remove multiword expressions), we have 1148 noun-noun dependency relations and many are mwes.
9. Rule #8 (Remove dates and numbers) worked fairly well, we only have one temporal modifier (a sunny day) but 748 number
dependencies and a need to tie this up to real numbers
10. Rule #9 (Turn subordinates into coordinates) also seems to have worked ok.
11. Rule #11(Remove indirect interrogative and parenthetical phrases). Well there are at least 4 real appositives, e.g Four
middle eastern children , three girls and one boy , are climbing on the grotto with a pink interior.
The construction rules didn’t work too well…

Results of SICK
–  The curators of the corpus also made an effort to
reduce the amount of “encyclopedic knowledge” about
the world that is needed.
–  They say “To ensure the quality of the data set, all the
sentences were checked for grammatical or lexical
mistakes and disfluencies by a native English speaker.”
–  Reasonably well, we expect?
–  But the negations tell a different story. All (or almost all)
the expletives (539 of them) are negative, but negation
is not marked. This wouldn’t be a problem, if these
sentences made sense. Many do not.
How well did they do the job of simplifying the language?

Processing of SICK
Meaning-preserving Transformations (from LREC2014 paper again)

Processing of SICK
Meaning-altering Transformations

Results of SICK
According to stats we have 436 neg dependency relations. They really seem
to correspond to negation using the adverb “not”.
Clearly we are missing many negations (most if not all the 539 expletives
are negative), but the determiners ‘no’ are not marked as negative. This we
can correct semi-automatically.
But there are many atrocious sentences that do not make sense at all:
A person is ignoring the motocross bike that is lying on its side and
there is no one is racing by; (maybe the third ``is” is a typo?) or There is
no man wearing clothes that are covered with paint or is sitting outside in a
busy area writing something.
How can we measure how many bad sentences there are? We decided to
investigate a reduced sub-corpus.
Negation in SICK-PARSEY

The image cannot be displayed. Your computer may not have enough
memory to open the image, or the image may have been corrupted.
Restart your computer, and then open the file again. If the red x still
appears, you may have to delete the image and then insert it again.
3-4-5: a hand-checked sub-corpus
–  Only one conjunction:
Paper and scissors both cut
–  13 copulas (4 wrong of them are wrong:
The man is training, The man is rock climbing,
A woman is grating carrots, The woman is dicing garlic )
–  19 negations marked, but 26 neg expletives, 10 nobodies, 1
no person è should be 56 negations, needs work, we know.
–  Compounds? Have 20 nns, some are real compounds:
–  ping pong, golden retriever, sumo wrestlers, tiger cub, sumo
ringers, baby pandas, cartoon airplane
–  Some are processing mistakes, gerund for the noun, lots (14
in 390) hamster singing, lion walking, panda climbing, etc.
–  Also 9 cases of particle verbs (need to update PWN?)
390 sentences (385 nsubj+ 5 nsubjpass)
Problems pale in comparison to WSD

–  Mapping has many issues
–  QAD (quick and dirty, Tetrault&Lee)
–  Rewriting of CoNLL is necessary for
most sentences, future work…
Princeton WN to SUMO mapping
–  SUMO is intended as a foundation
ontology for a variety of information
processing systems.
–  First released in December 2000, it
uses a version of the language
SUO-KIF (first-order logic with
bells&whistles) in a LISP-like syntax.
–  A mapping from WordNet synsets to
SUMO which makes it special.
–  Many domain ontologies to extend it
–  “the largest formal public ontology in
existence today.”
Why SUMO? Language is important
SUMO is open source, upper ontology

Princeton WN-SUMO mapping
text = People are walking
1 People people NOUN NNS_ 3 nsubj _NNS|07942152-n GroupOfPeople=
2 are be VERB VBP _ 3 aux _ VBP|02604760-v|Entity+
3 walking walk VERB VBG_ 0 ROOT _ VBG|01904930-v|Walking=
Kinds of mapping:
1.  Equality: the synset corresponds exactly to the concept (as in Walking= above)
2.  Containment: Concept+, the synset is a subconcept of the SUMO concept
3.  Negation: Concept[ (the negation of the concept) silent= not communicating
4.  A-box: shouldn’t be happening here, but it does. ok for Asian, German?
Why SUMO?
SUMO’s mappings: the theory

Examples
text = Men are sawing
1 Men man NOUN NNS_ 3 nsubj _ NNS|02472293-n|Hominid=
2 are be VERB VBP _ 3 aux _ VBP|02604760-v|Entity+
3  sawing saw VERB VBG_ 0 ROOT _ VBG|01559590-v|Cutting+
“saw” is a very specific verb and has only one synset in PWN:
01559590-v saw | (cut with a saw; "saw wood for the fireplace")
But SUMO has no exact match for this concept, so it maps to Cutting+, a kind of Cutting,
the one done with a “saw”.
Much worse is the situation with “Man” that occurs in 33 PWN synsets, the WSD alg
decided for Hominid=, but Man= (equivalent mapping) would be much better.
Containment

Examples
text = Some people are silent
1 Some some DET DT _ 2 det _ DT|?|?
2 people people NOUN NNS_ 4 nsubj _NNS|07942152-
nGroupOfPeople=
3 are be VERB VBP _ 4 cop _ VBP|02604760-v|Entity+
4 silent silent ADJ JJ _ 0 ROOT _ JJ|00942163-a|
LinguisticCommunication[
“silent” is an adjective in 6 synsets, mapped to LinguisticCommunication[, which is to be
read as non-LinguisticCommunication. This seems the wrong WSD, but the correct one
would give us Communication[ (negated subsuming mapping).
Negation

Examples
1 An a DET DT _ 3 det _ DT|?|?
2 American american ADJ JJ _ 3 amod_ JJ|02927303-a|LandArea@
3 footballer footballer NOUN NN _ 5 nsubj _ NN|10101634-n|
ProfessionalAthlete+
4 is be VERB VBZ _ 5 aux _ VBZ|02604760-v|Entity+
5 wearing wear VERB VBG _ 0 ROOT _ VBG|00075021-v|
RecreationOrExercise+
6 the the DET DT _ 10 det _ DT|?|?
7 red red ADJ JJ _ 10 amod_ JJ|00381097-a|Red=
8 and and CONJ CC _ 7 cc _ CC|?|?
9 white white ADJ JJ _ 7 conj _ JJ|00393105-a|White=
10 strip strip NOUN NN _ 5 dobj _ NN|09449510-n|SelfConnectedObject+
The mapping of ‘American’ to related to the continent America is a reasonable hook for future work
A-box

Analysis
The numbers
Total SICK mappings:
38671 ? Most need to be discarded
29606 + most need to be made specific
16477 =
171 @
67 [
Need to deal with negated, a-box
concepts
Need more bodily process (sleeping..),
Need more baby-zoology (tiger, kitten
not the same)
But the killer is WSD

Conclusion
QAD with FreeLing WSD does NOT work
Could repeat Petrov’s words.
Could make a more careful corpus, checking its
processing.
Could rethink WSD. Have done so and included
all the PWN senses, for the time being.
It’s less wrong!!
Considering how to make sure that ontologies
pay for their upkeeping, without throwing away
our Neural Networks gains.

Future Work
Enhancing dependencies with friends
Want to use what we’ve learnt during the summer
with the presentations of Schuster, Lewis, Reddy
and Stanovsky.They all had rules that applied to
dependencies to make them more `semantic’. We
also have the AKR rules and the matcher rules to
use and improve.
At the moment, working on Stanovsky’s (and
Fauke’s rules, they work for EN and DE) with
Prateek Jain.
Also working on the Entailment pairs of SICK.
(note that simply concepts can give useful info2)

CODA
We are the UD-Portuguese now!
There were two totally automatic
treebanks in Portuguese in UDs.
Zeman’s version started from
Linguateca’s Bosque and did
conversions. Google’s UD-BR used
web data and no manual input.
Started re-annotating Bosque in Oct
2016. We’re now the official UD-PT,
as we automatically converted and
manually checked (or are checking)
Bosque. Paper submitted.

Universal Dependencies For Inference

Recommended

Recommended

More Related Content

Similar to Universal Dependencies For Inference

Similar to Universal Dependencies For Inference (20)

More from Valeria de Paiva

More from Valeria de Paiva (20)

Recently uploaded

Recently uploaded (20)

Universal Dependencies For Inference