SlideShare a Scribd company logo
1 of 30
Download to read offline
Semantic Structural Evaluation
for Text Simplification
Elior Sulem, Omri Abend and Ari Rappoport
The Hebrew University of Jerusalem
NAACL HLT 2018
2
Text Simplification
John wrote a book. I read the book.Last year I read the book John authored
Original sentence One or several simpler sentences
3
Text Simplification
John wrote a book. I read the book.Last year I read the book John authored
Original sentence One or several simpler sentences
Multiple motivations Preprocessing for Natural Language Processing tasks
e.g., machine translation, relation extraction, parsing
Reading aids, Language Comprehension
e.g., people with aphasia, dyslexia, second language learners
4
Two types of Simplification
John wrote a book. I read the book.Last year I read the book John authored
Original sentence One or several simpler sentences
Lexical operations
e.g., word substitution
Structural operations
e.g., sentence splitting, deletion
Here: the first automatic evaluation measure for structural simplification.
All the previous evaluation approaches targeted lexical simplification.
5
Overview
1. Current Text Simplification Evaluation
2. A New Measure for Structural Simplification
SAMSA (Simplification Automatic Measure through Semantic Annotation)
2.1. SAMSA properties
2.2 The semantic structures
2.3 SAMSA computation
3. Human Evaluation Benchmark
4. Correlation Analysis with Human Evaluation
5. Conclusion
6
Current Text Simplification Evaluation
Main automatic metrics
BLEU, Panineni et al., 2002
SARI, Xu et al., 2016
Reference-based
The output is compared to one or multiple references
Focus on lexical aspects
Do not take into account structural aspects
7
A New Measure for Structural Simplification
SAMSA
Simplification Automatic evaluation Measure
through Semantic Annotation
8
SAMSA Properties
● Measures the preservation of the sentence-level semantics
● Measures structural simplicity
● No reference simplifications
● Fully automatic
● Semantic parsing only on the source side
9
SAMSA Properties
Example:
John arrived home and gave Mary a call. (input)
John arrived home. John called Mary. (output)
Assumption:
In an ideal simplification each event is placed in a different sentence.
Fits with existing practices in Text Simplification.
(Glavaš and Štajner, 2013; Narayan and Gardent, 2014)
score
10
SAMSA Properties
Example:
John arrived home and gave Mary a call. (input)
John arrived home. John called Mary. (output)
SAMSA focuses on the core semantic components of the sentence,
and is tolerant to the deletion of other units.
score
11
The Semantic Structures
Semantic Annotation: UCCA (Abend and Rappoport, 2013)
- Based on typological and cognitive theories
(Dixon, 2010, 2012; Langacker, 2008)
P
A
A
John arrived home
L
and
H H
F
gave a call to Mary
A
P
CE R C
A
Process (P) Function (F)
Participant (A) Parallel Scene (H)
Center (C) Linker (L)
Elaborator (E) Relator (R)
12
The Semantic Structures
Semantic Annotation: UCCA (Abend and Rappoport, 2013)
- Stable across translations (Sulem, Abend and Rappoport, 2015)
- Used for the evaluation of MT and GEC (Birch et al., 2016; Choshen and
Abend, 2018)
P
A
A
John arrived home
L
and
H H
F
gave a call to Mary
A
P
CE R C
A
Process (P) Function (F)
Participant (A) Parallel Scene (H)
Center (C) Linker (L)
Elaborator (E) Relator (R)
13
The Semantic Structures
Semantic Annotation: UCCA (Abend and Rappoport, 2013)
- Explicitly annotates semantic distinctions, abstracting away from syntax
(like AMR; Banarescu et al., 2013)
- Unlike AMR, semantic units are directly anchored in the text.
P
A
A
John arrived home
L
and
H H
F
gave a call to Mary
A
P
CE R C
A
Process (P) Function (F)
Participant (A) Parallel Scene (H)
Center (C) Linker (L)
Elaborator (E) Relator (R)
14
The Semantic Structures
Semantic Annotation: UCCA (Abend and Rappoport, 2013)
- UCCA parsing (Hershcovich et al., 2017, 2018)
- Shared Task in Sem-Eval 2019!
P
A
A
John arrived home
L
and
H H
F
gave a call to Mary
A
P
CE R C
A
Process (P) Function (F)
Participant (A) Parallel Scene (H)
Center (C) Linker (L)
Elaborator (E) Relator (R)
15
The Semantic Structures
Semantic Annotation: UCCA (Abend and Rappoport, 2013)
- Scenes evoked by a Main Relation (Process or State).
P
A
A
John arrived home
L
and
H H
F
gave a call to Mary
A
P
CE R C
A
Process (P) Function (F)
Participant (A) Parallel Scene (H)
Center (C) Linker (L)
Elaborator (E) Relator (R)
16
The Semantic Structures
Semantic Annotation: UCCA (Abend and Rappoport, 2013)
- A Scene may contain one or several Participants.
P
A
A
John arrived home
L
and
H H
F
gave a call to Mary
A
P
CE R C
A
Process (P) Function (F)
Participant (A) Parallel Scene (H)
Center (C) Linker (L)
Elaborator (E) Relator (R)
17
SAMSA Computation
Example:
John arrived home John gave Mary a call (input Scenes)
John arrived home. John called Mary. (output sentences)
1. Match each Scene to a sentence.
2. Give a score to each Scene assessing its meaning preservation in the
aligned sentence.
Evaluated through the preservation of its main semantic
components.
3. Average the scores and penalize non-splitting.
18
SAMSA Computation
Scene to Sentence Matching:
● A word alignment tool is used (Sultan et al., 2014) for aligning a Scene to
the candidate sentences.
Each word is aligned to 1 or 0 words in the candidate sentence.
● To each Scene we match the sentence for which the highest number of
word alignments is obtained.
● If there are more sentences than Scenes, a score of zero is assigned.
John arrived home John gave Mary a call (input Scenes)
John arrived home. John called Mary. (output sentences)
19
SAMSA Computation
John gave Mary a call
John called Mary
Word alignment UCCA annotation
[John]A
[gaveF
]P-
[Mary]A
[aE
callC
]-P(CONT.)
- Minimal center of the Main Relation (Process / State)
- Minimal center of the kth
Participant
Suppose the Scene Sc is matched to the sentence Sen:
Scene
Sentence
ScoreSen(Sc)=
1
2
(ScoreSen(MR)+
1
K
∑
i=1
K
ScoreSen (Park))
MR
Park
ScoreSen(u)=
1
0
u is aligned to a word in Sen
otherwise
20
SAMSA Computation
● Average over the input Scenes
● Non-splitting penalty:
We also experiment with SAMSAabl
, without non-splitting penalty.
nout
ninp
Number of output sentences
Number of input Scenes
21
Human Evaluation Benchmark
- 5 annotators
- 100 source sentences (PWKP test set)
- 6 Simplification systems + Simple corpus
- 4 Questions for each input-output pair (1 to 3 scale):
Is the output grammatical?
Does the output add information, compared to the input?
Does the output remove important information, compared to the input?
Is the output simpler than the input, ignoring the complexity of the words?
Qa
Qd
Qb
Qc
- Parameters: -Grammaticality (G)
-Meaning Preservation (P)
-Structural Simplicity (S)
22
Human Evaluation Benchmark
- 5 annotators
- 100 source sentences (PWKP test set)
- 6 Simplification systems + Simple corpus
- 4 Questions for each input-output pair (1 to 3 scale):
Is the output grammatical?
Does the output add information, compared to the input?
Does the output remove important information, compared to the input?
Is the output simpler than the input, ignoring the complexity of the words?
Qa
Qd
Qb
Qc
Human scores available at:
https://github.com/eliorsulem/SAMSA
AvgHuman = (G+P+S)
1
3
23
Correlation with Human Evaluation
SAMSA obtained the best correlation for AvgHuman.
SAMSAabl obtained the best correlation for Meaning Preservation.
Spearman’s correlation at the system level of the metric scores with the human evaluation scores,
considering the output of the 6 simplification systems
G – Grammaticality, P – Meaning Preservation, S – Strucutral Simplicity
Reference-less Reference-based
SAMSA
Semi-Aut.
SAMSA
Aut.
SAMSAabl
Semi-Aut.
SAMSAabl
Aut.
BLEU SARI Sent.
with Splits
G 0.54 0.37 0.14 0.14 0.09 -0.77 0.09
P -0.09 -0.37 0.54 0.54 0.37 -0.14 -0.49
S 0.54 0.71 -0.71 -0.71 -0.60 -0.43 0.83
AvgHuman 0.58 0.35 0.09 0.09 0.06 -0.81 0.14
24
Correlation with Human Evaluation
SAMSA is ranked second and third for Simplicity.
When resctricted to multi-Scene sentences, SAMSA Semi-Aut. has a correlation
of 0.89 (p=0.009). For Sent. with Splits, it is 0.77 (p=0.04).
Reference-less Reference-based
SAMSA
Semi-Aut.
SAMSA
Aut.
SAMSAabl
Semi-Aut.
SAMSAabl
Aut.
BLEU SARI Sent.
with Splits
G 0.54 0.37 0.14 0.14 0.09 -0.77 0.09
P -0.09 -0.37 0.54 0.54 0.37 -0.14 -0.49
S 0.54 0.71 -0.71 -0.71 -0.60 -0.43 0.83
AvgHuman 0.58 0.35 0.09 0.09 0.06 -0.81 0.14
Spearman’s correlation at the system level of the metric scores with the human evaluation scores,
considering the output of the 6 simplification systems
G – Grammaticality, P – Meaning Preservation, S – Strucutral Simplicity
25
Correlation with Human Evaluation
High similarity between the Semi-Automatic and the Automatic implementations.
For SAMSAabl, the ranking is the same.
Reference-less Reference-based
SAMSA
Semi-Aut.
SAMSA
Aut.
SAMSAabl
Semi-Aut.
SAMSAabl
Aut.
BLEU SARI Sent.
with Splits
G 0.54 0.37 0.14 0.14 0.09 -0.77 0.09
P -0.09 -0.37 0.54 0.54 0.37 -0.14 -0.49
S 0.54 0.71 -0.71 -0.71 -0.60 -0.43 0.83
AvgHuman 0.58 0.35 0.09 0.09 0.06 -0.81 0.14
Spearman’s correlation at the system level of the metric scores with the human evaluation scores,
considering the output of the 6 simplification systems
G – Grammaticality, P – Meaning Preservation, S – Strucutral Simplicity
26
Correlation with Human Evaluation
Low and negative correlations for BLEU and SARI.
Reference-less Reference-based
SAMSA
Semi-Aut.
SAMSA
Aut.
SAMSAabl
Semi-Aut.
SAMSAabl
Aut.
BLEU SARI Sent.
with Splits
G 0.54 0.37 0.14 0.14 0.09 -0.77 0.09
P -0.09 -0.37 0.54 0.54 0.37 -0.14 -0.49
S 0.54 0.71 -0.71 -0.71 -0.60 -0.43 0.83
AvgHuman 0.58 0.35 0.09 0.09 0.06 -0.81 0.14
Spearman’s correlation at the system level of the metric scores with the human evaluation scores,
considering the output of the 6 simplification systems
G – Grammaticality, P – Meaning Preservation, S – Strucutral Simplicity
27
Correlation with Existing Benchmark
QATS task (Štajner et al., 2016)
Pearson Correlation with the Overall Human Score:
●
Semi-automatic and automatic SAMSA rank 3rd
and 4th
(0.32 and 0.28),
out of 15 measures.
● Surpassed by the best performing systems by a small margin (0.33 and 0.34).
Although: - We did not use training data (human scores)
- SAMSA focuses on structural simplicity.
28
Conclusion
● We proposed SAMSA, the first structure-aware measure for Text Simplification.
● SAMSA explicitly targets the structural component of Text Simplification.
● SAMSA gets substantial correlations with human evaluation.
● Existing measures fail to correlate with human judgments when structural
simplification is performed.
29
● SAMSA can be used for tuning Text Simplification systems.
● Semantic decomposition with UCCA can be used for improving
Text Simplification (Sulem, Abend and Rappoport, ACL 2018).
● SAMSA can be extended to other Text-to-Text generation tasks
as paraphrasing, sentence compression, or fusion.
Future Work
30
Thank you
eliors@cs.huji.ac.il
Elior Sulem
www.cs.huji.ac.il/~eliors
Code and Data: https://github.com/eliorsulem/SAMSA

More Related Content

Similar to Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification

Svetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a CorpusSvetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a CorpusSvetlin Nakov
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingKaterina Vylomova
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Marcin Junczys-Dowmunt
 
Artificial intelligence for Social Good
Artificial intelligence for Social GoodArtificial intelligence for Social Good
Artificial intelligence for Social GoodOana Tifrea-Marciuska
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...Lifeng (Aaron) Han
 
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdfINTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdfranapoonam1
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...iyo
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
 
natural language processing
natural language processing natural language processing
natural language processing sunanthakrishnan
 
Automated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfAutomated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfSarah Marie
 

Similar to Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification (20)

Svetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a CorpusSvetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a Corpus
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
 
Artificial intelligence for Social Good
Artificial intelligence for Social GoodArtificial intelligence for Social Good
Artificial intelligence for Social Good
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languagesMiguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
 
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
 
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdfINTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
 
natural language processing
natural language processing natural language processing
natural language processing
 
Automated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfAutomated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdf
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
main
mainmain
main
 

More from Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Recently uploaded

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 

Recently uploaded (20)

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 

Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification

  • 1. Semantic Structural Evaluation for Text Simplification Elior Sulem, Omri Abend and Ari Rappoport The Hebrew University of Jerusalem NAACL HLT 2018
  • 2. 2 Text Simplification John wrote a book. I read the book.Last year I read the book John authored Original sentence One or several simpler sentences
  • 3. 3 Text Simplification John wrote a book. I read the book.Last year I read the book John authored Original sentence One or several simpler sentences Multiple motivations Preprocessing for Natural Language Processing tasks e.g., machine translation, relation extraction, parsing Reading aids, Language Comprehension e.g., people with aphasia, dyslexia, second language learners
  • 4. 4 Two types of Simplification John wrote a book. I read the book.Last year I read the book John authored Original sentence One or several simpler sentences Lexical operations e.g., word substitution Structural operations e.g., sentence splitting, deletion Here: the first automatic evaluation measure for structural simplification. All the previous evaluation approaches targeted lexical simplification.
  • 5. 5 Overview 1. Current Text Simplification Evaluation 2. A New Measure for Structural Simplification SAMSA (Simplification Automatic Measure through Semantic Annotation) 2.1. SAMSA properties 2.2 The semantic structures 2.3 SAMSA computation 3. Human Evaluation Benchmark 4. Correlation Analysis with Human Evaluation 5. Conclusion
  • 6. 6 Current Text Simplification Evaluation Main automatic metrics BLEU, Panineni et al., 2002 SARI, Xu et al., 2016 Reference-based The output is compared to one or multiple references Focus on lexical aspects Do not take into account structural aspects
  • 7. 7 A New Measure for Structural Simplification SAMSA Simplification Automatic evaluation Measure through Semantic Annotation
  • 8. 8 SAMSA Properties ● Measures the preservation of the sentence-level semantics ● Measures structural simplicity ● No reference simplifications ● Fully automatic ● Semantic parsing only on the source side
  • 9. 9 SAMSA Properties Example: John arrived home and gave Mary a call. (input) John arrived home. John called Mary. (output) Assumption: In an ideal simplification each event is placed in a different sentence. Fits with existing practices in Text Simplification. (Glavaš and Štajner, 2013; Narayan and Gardent, 2014) score
  • 10. 10 SAMSA Properties Example: John arrived home and gave Mary a call. (input) John arrived home. John called Mary. (output) SAMSA focuses on the core semantic components of the sentence, and is tolerant to the deletion of other units. score
  • 11. 11 The Semantic Structures Semantic Annotation: UCCA (Abend and Rappoport, 2013) - Based on typological and cognitive theories (Dixon, 2010, 2012; Langacker, 2008) P A A John arrived home L and H H F gave a call to Mary A P CE R C A Process (P) Function (F) Participant (A) Parallel Scene (H) Center (C) Linker (L) Elaborator (E) Relator (R)
  • 12. 12 The Semantic Structures Semantic Annotation: UCCA (Abend and Rappoport, 2013) - Stable across translations (Sulem, Abend and Rappoport, 2015) - Used for the evaluation of MT and GEC (Birch et al., 2016; Choshen and Abend, 2018) P A A John arrived home L and H H F gave a call to Mary A P CE R C A Process (P) Function (F) Participant (A) Parallel Scene (H) Center (C) Linker (L) Elaborator (E) Relator (R)
  • 13. 13 The Semantic Structures Semantic Annotation: UCCA (Abend and Rappoport, 2013) - Explicitly annotates semantic distinctions, abstracting away from syntax (like AMR; Banarescu et al., 2013) - Unlike AMR, semantic units are directly anchored in the text. P A A John arrived home L and H H F gave a call to Mary A P CE R C A Process (P) Function (F) Participant (A) Parallel Scene (H) Center (C) Linker (L) Elaborator (E) Relator (R)
  • 14. 14 The Semantic Structures Semantic Annotation: UCCA (Abend and Rappoport, 2013) - UCCA parsing (Hershcovich et al., 2017, 2018) - Shared Task in Sem-Eval 2019! P A A John arrived home L and H H F gave a call to Mary A P CE R C A Process (P) Function (F) Participant (A) Parallel Scene (H) Center (C) Linker (L) Elaborator (E) Relator (R)
  • 15. 15 The Semantic Structures Semantic Annotation: UCCA (Abend and Rappoport, 2013) - Scenes evoked by a Main Relation (Process or State). P A A John arrived home L and H H F gave a call to Mary A P CE R C A Process (P) Function (F) Participant (A) Parallel Scene (H) Center (C) Linker (L) Elaborator (E) Relator (R)
  • 16. 16 The Semantic Structures Semantic Annotation: UCCA (Abend and Rappoport, 2013) - A Scene may contain one or several Participants. P A A John arrived home L and H H F gave a call to Mary A P CE R C A Process (P) Function (F) Participant (A) Parallel Scene (H) Center (C) Linker (L) Elaborator (E) Relator (R)
  • 17. 17 SAMSA Computation Example: John arrived home John gave Mary a call (input Scenes) John arrived home. John called Mary. (output sentences) 1. Match each Scene to a sentence. 2. Give a score to each Scene assessing its meaning preservation in the aligned sentence. Evaluated through the preservation of its main semantic components. 3. Average the scores and penalize non-splitting.
  • 18. 18 SAMSA Computation Scene to Sentence Matching: ● A word alignment tool is used (Sultan et al., 2014) for aligning a Scene to the candidate sentences. Each word is aligned to 1 or 0 words in the candidate sentence. ● To each Scene we match the sentence for which the highest number of word alignments is obtained. ● If there are more sentences than Scenes, a score of zero is assigned. John arrived home John gave Mary a call (input Scenes) John arrived home. John called Mary. (output sentences)
  • 19. 19 SAMSA Computation John gave Mary a call John called Mary Word alignment UCCA annotation [John]A [gaveF ]P- [Mary]A [aE callC ]-P(CONT.) - Minimal center of the Main Relation (Process / State) - Minimal center of the kth Participant Suppose the Scene Sc is matched to the sentence Sen: Scene Sentence ScoreSen(Sc)= 1 2 (ScoreSen(MR)+ 1 K ∑ i=1 K ScoreSen (Park)) MR Park ScoreSen(u)= 1 0 u is aligned to a word in Sen otherwise
  • 20. 20 SAMSA Computation ● Average over the input Scenes ● Non-splitting penalty: We also experiment with SAMSAabl , without non-splitting penalty. nout ninp Number of output sentences Number of input Scenes
  • 21. 21 Human Evaluation Benchmark - 5 annotators - 100 source sentences (PWKP test set) - 6 Simplification systems + Simple corpus - 4 Questions for each input-output pair (1 to 3 scale): Is the output grammatical? Does the output add information, compared to the input? Does the output remove important information, compared to the input? Is the output simpler than the input, ignoring the complexity of the words? Qa Qd Qb Qc - Parameters: -Grammaticality (G) -Meaning Preservation (P) -Structural Simplicity (S)
  • 22. 22 Human Evaluation Benchmark - 5 annotators - 100 source sentences (PWKP test set) - 6 Simplification systems + Simple corpus - 4 Questions for each input-output pair (1 to 3 scale): Is the output grammatical? Does the output add information, compared to the input? Does the output remove important information, compared to the input? Is the output simpler than the input, ignoring the complexity of the words? Qa Qd Qb Qc Human scores available at: https://github.com/eliorsulem/SAMSA AvgHuman = (G+P+S) 1 3
  • 23. 23 Correlation with Human Evaluation SAMSA obtained the best correlation for AvgHuman. SAMSAabl obtained the best correlation for Meaning Preservation. Spearman’s correlation at the system level of the metric scores with the human evaluation scores, considering the output of the 6 simplification systems G – Grammaticality, P – Meaning Preservation, S – Strucutral Simplicity Reference-less Reference-based SAMSA Semi-Aut. SAMSA Aut. SAMSAabl Semi-Aut. SAMSAabl Aut. BLEU SARI Sent. with Splits G 0.54 0.37 0.14 0.14 0.09 -0.77 0.09 P -0.09 -0.37 0.54 0.54 0.37 -0.14 -0.49 S 0.54 0.71 -0.71 -0.71 -0.60 -0.43 0.83 AvgHuman 0.58 0.35 0.09 0.09 0.06 -0.81 0.14
  • 24. 24 Correlation with Human Evaluation SAMSA is ranked second and third for Simplicity. When resctricted to multi-Scene sentences, SAMSA Semi-Aut. has a correlation of 0.89 (p=0.009). For Sent. with Splits, it is 0.77 (p=0.04). Reference-less Reference-based SAMSA Semi-Aut. SAMSA Aut. SAMSAabl Semi-Aut. SAMSAabl Aut. BLEU SARI Sent. with Splits G 0.54 0.37 0.14 0.14 0.09 -0.77 0.09 P -0.09 -0.37 0.54 0.54 0.37 -0.14 -0.49 S 0.54 0.71 -0.71 -0.71 -0.60 -0.43 0.83 AvgHuman 0.58 0.35 0.09 0.09 0.06 -0.81 0.14 Spearman’s correlation at the system level of the metric scores with the human evaluation scores, considering the output of the 6 simplification systems G – Grammaticality, P – Meaning Preservation, S – Strucutral Simplicity
  • 25. 25 Correlation with Human Evaluation High similarity between the Semi-Automatic and the Automatic implementations. For SAMSAabl, the ranking is the same. Reference-less Reference-based SAMSA Semi-Aut. SAMSA Aut. SAMSAabl Semi-Aut. SAMSAabl Aut. BLEU SARI Sent. with Splits G 0.54 0.37 0.14 0.14 0.09 -0.77 0.09 P -0.09 -0.37 0.54 0.54 0.37 -0.14 -0.49 S 0.54 0.71 -0.71 -0.71 -0.60 -0.43 0.83 AvgHuman 0.58 0.35 0.09 0.09 0.06 -0.81 0.14 Spearman’s correlation at the system level of the metric scores with the human evaluation scores, considering the output of the 6 simplification systems G – Grammaticality, P – Meaning Preservation, S – Strucutral Simplicity
  • 26. 26 Correlation with Human Evaluation Low and negative correlations for BLEU and SARI. Reference-less Reference-based SAMSA Semi-Aut. SAMSA Aut. SAMSAabl Semi-Aut. SAMSAabl Aut. BLEU SARI Sent. with Splits G 0.54 0.37 0.14 0.14 0.09 -0.77 0.09 P -0.09 -0.37 0.54 0.54 0.37 -0.14 -0.49 S 0.54 0.71 -0.71 -0.71 -0.60 -0.43 0.83 AvgHuman 0.58 0.35 0.09 0.09 0.06 -0.81 0.14 Spearman’s correlation at the system level of the metric scores with the human evaluation scores, considering the output of the 6 simplification systems G – Grammaticality, P – Meaning Preservation, S – Strucutral Simplicity
  • 27. 27 Correlation with Existing Benchmark QATS task (Štajner et al., 2016) Pearson Correlation with the Overall Human Score: ● Semi-automatic and automatic SAMSA rank 3rd and 4th (0.32 and 0.28), out of 15 measures. ● Surpassed by the best performing systems by a small margin (0.33 and 0.34). Although: - We did not use training data (human scores) - SAMSA focuses on structural simplicity.
  • 28. 28 Conclusion ● We proposed SAMSA, the first structure-aware measure for Text Simplification. ● SAMSA explicitly targets the structural component of Text Simplification. ● SAMSA gets substantial correlations with human evaluation. ● Existing measures fail to correlate with human judgments when structural simplification is performed.
  • 29. 29 ● SAMSA can be used for tuning Text Simplification systems. ● Semantic decomposition with UCCA can be used for improving Text Simplification (Sulem, Abend and Rappoport, ACL 2018). ● SAMSA can be extended to other Text-to-Text generation tasks as paraphrasing, sentence compression, or fusion. Future Work