Institut für Anthropomatik1 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelGrammatical Agreement in SMTSeminar Sprach-zu-Sp...
Institut für Anthropomatik2 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelInflection– Modification of a word– signals gram...
Institut für Anthropomatik3 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelLocal Agreement ErrorsRef:the-carFgoFwith-speedH...
Institut für Anthropomatik4 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelApproaches for SMTMorphological Generation– Crea...
Institut für Anthropomatik5 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: Idea“Generating Comple...
Institut für Anthropomatik6 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: LexiconsMorphology ana...
Institut für Anthropomatik7 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: Inflection PredictionM...
Institut für Anthropomatik8 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: EvaluationEnglish-Russ...
Institut für Anthropomatik9 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: ConclusionNeeded resou...
Institut für Anthropomatik10 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: Idea“Agreement Constraints for Sta...
Institut für Anthropomatik11 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: Feature StructureFeature structure...
Institut für Anthropomatik12 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: GrammarSynchronous grammar learned...
Institut für Anthropomatik13 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: TrainingPropagation rules tocaptur...
Institut für Anthropomatik14 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: DecodingModel:Every element of rul...
Institut für Anthropomatik15 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: EvaluationEnglish-GermanEuroparl a...
Institut für Anthropomatik16 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: ConclusionNeeded resources:– Paral...
Institut für Anthropomatik17 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: Idea1. Segmentation2. Tagging3. Sc...
Institut für Anthropomatik18 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: SegmentationTrain conditional rand...
Institut für Anthropomatik19 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: TaggingTrain CRF on full sentences...
Institut für Anthropomatik20 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: ScoringScoring of word sequences n...
Institut für Anthropomatik21 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: EvaluationEnglish-ArabicTraining d...
Institut für Anthropomatik22 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: ConclusionNeeded resources:– Treeb...
Institut für Anthropomatik23 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelGreen, S. and DeNero, J. (2012). “A Class-Based...
Upcoming SlideShare
Loading in...5
×

Grammatical Agreement in SMT

128

Published on

Grammatical Agreement in SMT

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
128
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Grammatical Agreement in SMT

  1. 1. Institut für Anthropomatik1 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelGrammatical Agreement in SMTSeminar Sprach-zu-Sprach-ÜbersetzungSS 2013
  2. 2. Institut für Anthropomatik2 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelInflection– Modification of a word– signals grammatical variants (tense, gender, case, …)– e.g. walk vs. WalkedAgreement– Inflection for related words in a sentence has to agree– e.g. das Haus vs. die HausSome languages are weakly inflected (e.g. English)Some are highly inflected (e.g. German, Arabic, …)Inflection and Agreement
  3. 3. Institut für Anthropomatik3 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelLocal Agreement ErrorsRef:the-carFgoFwith-speedHypo:the-carFgoMwith-speedLong-distance Agreement ErrorsRef: celle qui parle , c’est ma femmeoneFwho speak , is my wifeFHypo: celui qui parle est ma femmeoneMwho speak is my spouseFAgreement Errors
  4. 4. Institut für Anthropomatik4 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelApproaches for SMTMorphological Generation– Create raw stems and modify with predicted inflectionAgreement Constraints– Use SCFG of target and add constraints to itClass-based Agreement Model– Use morphological word classes “Noun+Def+Sg+Fem”
  5. 5. Institut für Anthropomatik5 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: Idea“Generating Complex Morphology for Machine Translation” (Minkovand Toutanova, 2007)Convert MT output to stem sequencePredict an inflection for every stemReflect meaning and comply with agreement rules
  6. 6. Institut für Anthropomatik6 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: LexiconsMorphology analysis and generationOperations:– Stemming– Inflection– Morphological analysisCreate manuallyCreate automatically from dataHere: assumed as given
  7. 7. Institut für Anthropomatik7 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: Inflection PredictionMaximum Entropy Markov model (2ndorder)Features:– Monolingual– Bilingual– Lexical– Morphological– Syntacticp(̄y∣̄x)=∏t=1np(yt∣ yt−1 , yt−2 , xt ) , yt ∈It
  8. 8. Institut für Anthropomatik8 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: EvaluationEnglish-Russian and English-ArabicTechnical (software manual) domainInput: Aligned sentence pairs of reference translations (no output of MTSystem) → reduce noiseAccuracy (%) results
  9. 9. Institut für Anthropomatik9 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelMorphological Generation: ConclusionNeeded resources:– Large corpus of aligned sentence pairs– Lexicons (source and target) with the three operations+ Better accuracy than simple LM (even with small training data)+ Easy to add to existing MT system- Expensive creation of lexicons
  10. 10. Institut für Anthropomatik10 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: Idea“Agreement Constraints for Statistical Machine Translation intoGerman” (Williams and Koehn, 2011)String-to-tree modelSynchronous grammar for target languageAdding learned constraints and probabilitiesEvaluation of constraints during decoding
  11. 11. Institut für Anthropomatik11 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: Feature StructureFeature structureUnification
  12. 12. Institut für Anthropomatik12 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: GrammarSynchronous grammar learned from parallel corpusExtended by constraints at target-sideSample rule/constraint:NP-SB → the X1cat | die AP1Katze
  13. 13. Institut für Anthropomatik13 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: TrainingPropagation rules tocapture NP/PP agreements:Applied bottom-up
  14. 14. Institut für Anthropomatik14 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: DecodingModel:Every element of rule/constraint has a feature structureConstraint evaluation: Each hypothesis stores set of feature structurescorresponding to its root rule elementRecombination of hypotheses is possiblêt=arg maxtp(t∣s)p(t∣s)=1Z∑i=1nλi hi (s ,t)
  15. 15. Institut für Anthropomatik15 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: EvaluationEnglish-GermanEuroparl and News CommentaryParsing: BitPar; Alignment: GIZA++; SCFG rules: Moses toolkitTreebank for targetGrammar: ~140 m rulesBLEU scores and p-values for three test sets
  16. 16. Institut für Anthropomatik16 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelConstraints: ConclusionNeeded resources:– Parallel corpus– Heuristics for constraint extraction+ Improvement in translation accuracy- Improvement is quite small
  17. 17. Institut für Anthropomatik17 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: Idea1. Segmentation2. Tagging3. Scoring“A Class-Based Agreement Model for Generating Accurately InflectedTranslations” (Green and DeNero, 2012)During DecodingTarget-SideThree Steps:
  18. 18. Institut für Anthropomatik18 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: SegmentationTrain conditional random fieldFeatures:Centered 5-character windowDuring decodingNot as preprocessing stepLabels:I: Continuation (Inside)O: Outside (whitespace)B: BeginningF: Non-native chars
  19. 19. Institut für Anthropomatik19 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: TaggingTrain CRF on full sentences with gold classesFeatures:– Current and previous words, affixes, etc.Labels:– Morphological classes→ Gender, number, person, definiteness– e.g. 89 classes for ArabicExample:the carTagged: “Noun+Def+Sg+Fem”
  20. 20. Institut für Anthropomatik20 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: ScoringScoring of word sequences not comparable across hypotheses→ Scoring class sequences with generative modelSimple bigram LM over gold class sequences (add-1 smoothed)τ =arg maxτp(τ∣̂s)q(e)= p(τ)=∏i=1Ip(τi∣τi−1)
  21. 21. Institut für Anthropomatik21 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: EvaluationEnglish-ArabicTraining data: variety of sources (e.g. web)Development and Test: NIST sets (Newswire and mixed genre[broadcast news, newsgroups, weblog])Phrase-based decoderBLEU score for newswire setsBLEU score for mixed genre sets
  22. 22. Institut für Anthropomatik22 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelClass-Based: ConclusionNeeded resources:– Treebank for target (existing for many languages)– Large target corpus+ Improves translation quality+ Easy to integrate in existing MT system- Increases decoding time- Not very good for mixed genres
  23. 23. Institut für Anthropomatik23 24.06.13 Simon Hummel – Lehrstuhl Prof. WaibelGreen, S. and DeNero, J. (2012). “A Class-Based Agreement Model forGenerating Accurately Inflected Translations”. In: ACL.Williams, P. and Koehn, P. (2011). “Agreement Constraints for StatisticalMachine Translation into German”. In: Sixth Workshop on StatisticalMachine TranslationMinkov, E. and Toutanova, K. (2007) “Generating Complex Morphologyfor Machine Translation”. In: ACL.References
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×