SlideShare a Scribd company logo
Using Parallel Propbanks to
 Enhance Word-Alignments
The 3rd Linguistic Annotation Workshop at ACL ’09
                  August 7th, 2009


Jinho D. Choi (Univ. of Colorado at Boulder)
Martha Palmer (Univ. of Colorado at Boulder)
Niawen Xue (Brandeis University)
Parallel Propbanks
•   Propbank
    -    Corpus annotated with verbal propositions and their
         arguments (semantic roles)
        [ Gansu Province] also actively [ explored ] [ high risk business]
           Arg0: explorer                            Arg1: things explored


•   Parallel Propbanks
    -    Propbanks annotated in parallel corpus
        [!!"] " #             [ #$ ] [% $% &']
          Arg0                           Arg1



                                    2
Word-Alignments
•   Given parallel sentences, discover translation for each
    word
 !"         #      !     $"      %       &   #   '(     $%    )&


Construction is a principal economic activity in developing Pudong


•   GIZA++: a statistical machine translation toolkit
    -   It is hard to verify if the alignments are correct.

    -   Words with low frequencies may not get aligned.

    -   It does not account for semantics.



                                     3
Predicate Matching (based on GIZA++)
•    English Chinese Parallel Treebank (ECTB)
    -     Xinhua: Chinese newswire + literal translation

    -     Sinorama: Chinese news magazine + non-literal translation

        Xinhua: 12,895                              Sinorama: 40,086


                                                               19%
    32%
                                    En.verb
                         45%        En.be                          3%
                                    En.else     56%
                                    En.none                      22%
          19%   3%


                                    6
Top-down Argument Matching
•   Verify word-alignments
    -   For each Chinese verb vc aligned to some English verb ve

    -   Verify that the alignment is correct if the arguments of
        vc and ve match

         Arg0      ArgM ArgM     Rel                 Arg1
        [ !!" ]    [ " ] [ # ] [ #$ ] [ %            $%      &' ]

[Gansu Province ][ also][ actively] [explored ][ high risk business ]
      Arg0       ArgM ArgM              Rel            Arg1

                                      Bingo!

                                  7
Bottom-up Argument Matching
      •   Expand word-alignments
          -    For each Chinese verb vc aligned to no English word

          -    Align vc to ve such that ve is an English verb that maximizes
               the argument matching with vc



                     Arg0    A.M A.M A.M       Arg1    Rel
              [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ ']


[ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ]
                                                                        ][
                     Arg0                      A.M A.M            Rel          Arg1



                                             8
Bottom-up Argument Matching
      •   Expand word-alignments
          -    For each Chinese verb vc aligned to no English word

          -    Align vc to ve such that ve is an English verb that maximizes
               the argument matching with vc
  ArgM        Rel       Arg1
[Foreign ][ funded ][enterprises]in Gansu Province no longer worry about investment risk


              [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ ']
                    Arg0     A.M A.M A.M       Arg1    Rel

[ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ]
                                                                        ][
                     Arg0                      A.M A.M            Rel          Arg1



                                             8
Argument Matching Score
•   Macro argument matching score




•   Micro argument matching score




•   Thresholds
    -   Top-down: thresholds on macro score

    -   Bottom-up: thresholds on both macro and micro scores



                                9
System Overview
Source Language                     Target Language
    Corpus                              Corpus
                        GIZA++


                          Word
 Verbs aligned         Alignments    Verbs aligned
   to verbs                           to no word
                        Parallel
   Top-down            Propbanks      Bottom-up
    Matching                           Matching


   Verified                            Expanded
  Alignments                          Alignments
                       Enhanced
                       Alignments

                           10
Evaluations
•   Test Corpus
    -   NIST-GALE Web Genre Test Data

    -   100 parallel sentences, 365 verb tokens, 273 verb types

•   Measurements
    -   Term Coverage
        : how many Chinese verb-types are covered

    -   Term Expansion
        : how many English verb-types are suggested

    -   Alignment Accuracy
        : how many suggested English verb-types are correct



                                 11
Evaluations: Top-down
    Mac.th = 0.0 (GIZA++)                Mac.th = 0.5 (TDAM)
                               Term Coverage
        130.0
                                          129
         97.5
         65.0        79        76
                                                  62
         32.5
            0
                          Xinhua            Sinorama
                         Average Alignment Accuracy
90.0%
67.5%           83.35%     83.71%                 78.09%
45.0%                                    57.76%
22.5%
   0%
                    Xinhua                   Sinorama
                                  12
Evaluations: Bottom-up
                         Mac.th = 0.8, Mic.th = 0.6

                                Term Coverage
             30.0
             22.5                                27
             15.0         18
              7.5
                0
5.5% error-reduction    Xinhua               Sinorama
17% abs-improvement     Average Alignment Accuracy
         70.0%
         52.5%         63.89%
         35.0%
         17.5%
            0%                                  14.46%
                       Xinhua                   Sinorama
                                   13
Conclusions & Future Work
•   Conclusions
    -   Top-down Argument Matching is most effective for verifying
        word-alignments based on non-literal translations that have
        proven difficult for GIZA++.

    -   Bottom-up Argument Matching shows promise for expanding
        the coverage of GIZA++ alignments based on literal
        translations.

•   We will try to enhance word-alignments by using
    -   Automatically labeled Propbanks

    -   Nombanks, Named-entity tags

    -   Parallel Propbanks prior to GIZA++


                                 14
Acknowledgements
•   We gratefully acknowledge the support of the National
    Science Foundation Grants IIS-0325646, Domain
    Independent Semantic Parsing, CISE-CRI-0551615,
    Towards a Comprehensive Linguistic Annotation, and a
    grant from the Defense Advanced Research Projects
    Agency (DARPA/IPTO) under the GALE program,
    DARPA/CMO Contract No. HR0011-06-C-0022,
    subcontract from BBN, Inc.
•   Special thanks to Daniel Gildea, Ding Liu (University of
    Rochester) who provided word-alignments, Wei Wang
    (Information Sciences Institute at University of Southern
    California) who provided the test-corpus, and Hua
    Zhong (University of Colorado at Boulder) who
    performed the evaluations.

                             15

More Related Content

Viewers also liked

Voluntariado Corporativo
Voluntariado CorporativoVoluntariado Corporativo
Voluntariado Corporativo
bancaliasturias
 
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
Miguel A. Amutio
 
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
Miguel A. Amutio
 
01 planeación 2012 2013 telesecundaria vicente guerrero
01 planeación 2012 2013 telesecundaria vicente guerrero01 planeación 2012 2013 telesecundaria vicente guerrero
01 planeación 2012 2013 telesecundaria vicente guerreroUSET
 
EL DIA DEL LOGRO
EL DIA DEL LOGROEL DIA DEL LOGRO
EL DIA DEL LOGRO
Moises Moisés
 
Proyecto dia del logro I.E Fanny Abanto Calle
Proyecto dia del logro I.E Fanny Abanto CalleProyecto dia del logro I.E Fanny Abanto Calle
Proyecto dia del logro I.E Fanny Abanto Calle
Juan Japz
 
Plan de trabajo del 1° día del logro 2016
Plan de trabajo del  1° día del logro 2016Plan de trabajo del  1° día del logro 2016
Plan de trabajo del 1° día del logro 2016
Reymundo Salcedo
 
Proyecto de aprendizaje dia del logro 2015
Proyecto de aprendizaje dia del logro 2015Proyecto de aprendizaje dia del logro 2015
Proyecto de aprendizaje dia del logro 2015
Jenrry Sánchez
 
I dia del logro 2015
I  dia del logro 2015I  dia del logro 2015
I dia del logro 2015
martinianosunidos
 
Proyecto i dia del logro
Proyecto i dia del logroProyecto i dia del logro
Proyecto i dia del logro
violetaegu
 

Viewers also liked (12)

Voluntariado Corporativo
Voluntariado CorporativoVoluntariado Corporativo
Voluntariado Corporativo
 
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
2012022 Esquema nacional de interoperabilidad (ENI), aplicando las normas téc...
 
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
El Esquema Nacional de Interoperabilidad (ENI) y la información geográfica en...
 
01 planeación 2012 2013 telesecundaria vicente guerrero
01 planeación 2012 2013 telesecundaria vicente guerrero01 planeación 2012 2013 telesecundaria vicente guerrero
01 planeación 2012 2013 telesecundaria vicente guerrero
 
EL DIA DEL LOGRO
EL DIA DEL LOGROEL DIA DEL LOGRO
EL DIA DEL LOGRO
 
Proyecto dia del logro I.E Fanny Abanto Calle
Proyecto dia del logro I.E Fanny Abanto CalleProyecto dia del logro I.E Fanny Abanto Calle
Proyecto dia del logro I.E Fanny Abanto Calle
 
Sesión día del logro 16 10 (2)
Sesión día del logro 16 10 (2)Sesión día del logro 16 10 (2)
Sesión día del logro 16 10 (2)
 
Plan de trabajo del 1° día del logro 2016
Plan de trabajo del  1° día del logro 2016Plan de trabajo del  1° día del logro 2016
Plan de trabajo del 1° día del logro 2016
 
Plan de trabajo dia de logro
Plan de trabajo dia de logroPlan de trabajo dia de logro
Plan de trabajo dia de logro
 
Proyecto de aprendizaje dia del logro 2015
Proyecto de aprendizaje dia del logro 2015Proyecto de aprendizaje dia del logro 2015
Proyecto de aprendizaje dia del logro 2015
 
I dia del logro 2015
I  dia del logro 2015I  dia del logro 2015
I dia del logro 2015
 
Proyecto i dia del logro
Proyecto i dia del logroProyecto i dia del logro
Proyecto i dia del logro
 

More from Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
Jinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
Jinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
Jinho Choi
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
Jinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
Jinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
Jinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
Jinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
Jinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
Jinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
Jinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological Sort
Jinho Choi
 
Tries - Put
Tries - PutTries - Put
Tries - Put
Jinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Jinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Jinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
Jinho Choi
 

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 

Recently uploaded (20)

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 

Using Parallel Propbanks to Enhance Word-alignments

  • 1. Using Parallel Propbanks to Enhance Word-Alignments The 3rd Linguistic Annotation Workshop at ACL ’09 August 7th, 2009 Jinho D. Choi (Univ. of Colorado at Boulder) Martha Palmer (Univ. of Colorado at Boulder) Niawen Xue (Brandeis University)
  • 2. Parallel Propbanks • Propbank - Corpus annotated with verbal propositions and their arguments (semantic roles) [ Gansu Province] also actively [ explored ] [ high risk business] Arg0: explorer Arg1: things explored • Parallel Propbanks - Propbanks annotated in parallel corpus [!!"] " # [ #$ ] [% $% &'] Arg0 Arg1 2
  • 3. Word-Alignments • Given parallel sentences, discover translation for each word !" # ! $" % & # '( $% )& Construction is a principal economic activity in developing Pudong • GIZA++: a statistical machine translation toolkit - It is hard to verify if the alignments are correct. - Words with low frequencies may not get aligned. - It does not account for semantics. 3
  • 4. Predicate Matching (based on GIZA++) • English Chinese Parallel Treebank (ECTB) - Xinhua: Chinese newswire + literal translation - Sinorama: Chinese news magazine + non-literal translation Xinhua: 12,895 Sinorama: 40,086 19% 32% En.verb 45% En.be 3% En.else 56% En.none 22% 19% 3% 6
  • 5. Top-down Argument Matching • Verify word-alignments - For each Chinese verb vc aligned to some English verb ve - Verify that the alignment is correct if the arguments of vc and ve match Arg0 ArgM ArgM Rel Arg1 [ !!" ] [ " ] [ # ] [ #$ ] [ % $% &' ] [Gansu Province ][ also][ actively] [explored ][ high risk business ] Arg0 ArgM ArgM Rel Arg1 Bingo! 7
  • 6. Bottom-up Argument Matching • Expand word-alignments - For each Chinese verb vc aligned to no English word - Align vc to ve such that ve is an English verb that maximizes the argument matching with vc Arg0 A.M A.M A.M Arg1 Rel [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ '] [ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ] ][ Arg0 A.M A.M Rel Arg1 8
  • 7. Bottom-up Argument Matching • Expand word-alignments - For each Chinese verb vc aligned to no English word - Align vc to ve such that ve is an English verb that maximizes the argument matching with vc ArgM Rel Arg1 [Foreign ][ funded ][enterprises]in Gansu Province no longer worry about investment risk [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ '] Arg0 A.M A.M A.M Arg1 Rel [ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ] ][ Arg0 A.M A.M Rel Arg1 8
  • 8. Argument Matching Score • Macro argument matching score • Micro argument matching score • Thresholds - Top-down: thresholds on macro score - Bottom-up: thresholds on both macro and micro scores 9
  • 9. System Overview Source Language Target Language Corpus Corpus GIZA++ Word Verbs aligned Alignments Verbs aligned to verbs to no word Parallel Top-down Propbanks Bottom-up Matching Matching Verified Expanded Alignments Alignments Enhanced Alignments 10
  • 10. Evaluations • Test Corpus - NIST-GALE Web Genre Test Data - 100 parallel sentences, 365 verb tokens, 273 verb types • Measurements - Term Coverage : how many Chinese verb-types are covered - Term Expansion : how many English verb-types are suggested - Alignment Accuracy : how many suggested English verb-types are correct 11
  • 11. Evaluations: Top-down Mac.th = 0.0 (GIZA++) Mac.th = 0.5 (TDAM) Term Coverage 130.0 129 97.5 65.0 79 76 62 32.5 0 Xinhua Sinorama Average Alignment Accuracy 90.0% 67.5% 83.35% 83.71% 78.09% 45.0% 57.76% 22.5% 0% Xinhua Sinorama 12
  • 12. Evaluations: Bottom-up Mac.th = 0.8, Mic.th = 0.6 Term Coverage 30.0 22.5 27 15.0 18 7.5 0 5.5% error-reduction Xinhua Sinorama 17% abs-improvement Average Alignment Accuracy 70.0% 52.5% 63.89% 35.0% 17.5% 0% 14.46% Xinhua Sinorama 13
  • 13. Conclusions & Future Work • Conclusions - Top-down Argument Matching is most effective for verifying word-alignments based on non-literal translations that have proven difficult for GIZA++. - Bottom-up Argument Matching shows promise for expanding the coverage of GIZA++ alignments based on literal translations. • We will try to enhance word-alignments by using - Automatically labeled Propbanks - Nombanks, Named-entity tags - Parallel Propbanks prior to GIZA++ 14
  • 14. Acknowledgements • We gratefully acknowledge the support of the National Science Foundation Grants IIS-0325646, Domain Independent Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. • Special thanks to Daniel Gildea, Ding Liu (University of Rochester) who provided word-alignments, Wei Wang (Information Sciences Institute at University of Southern California) who provided the test-corpus, and Hua Zhong (University of Colorado at Boulder) who performed the evaluations. 15