SlideShare a Scribd company logo
1 of 1
Download to read offline
Using Global Constraints and Reranking to Improve Cognates Detection
Michael Bloodgood1
Benjamin Strauss2
1
Department of Computer Science
The College of New Jersey
2
Computer Science and Engineering Dept.
The Ohio State University
Introduction
Background Cognates detection is the task of identifying words across languages that
have a common origin. Cognates are used for protolanguage reconstruction and
cross-language dictionary lookup. Cognates can also improve the quality of
machine translation, word alignment, and bilingual lexicon induction.
Current Solution Create a score matrix for all word pairs based on weighted
combinations of component scores. The component scores are computed on the
basis of word context information, word frequency information, temporal
information, word burstiness information, and phonetic information.
Our Contribution We propose a new algorithm for rescoring the matrix by taking into
account scores assigned to other word pairs. Precision and recall are improved by
large amounts in experiments across three language pairs with both traditional
and new large data testing conditions.
Motivation
Spanish
Portuguese
· · · cozinhar · · · andar
.
.
. ...
caminar
.80 (2) .70 (1)
.40 (2) .70 (1)
.20 .70
.
.
. ...
cocinar
.99 (1) .20 (2)
.99 (1) .10 (2)
.99 .05
Blue - Score from Initial Score Matrix
(#) - Reverse Rank
Green - Score after Reverse Rank
(#) - Forward Rank
Black - Score after Forward Rank
General Task
Let X = {x1, x2, ..., xn}. Let Y = {y1, y2, ..., yn}.
Extract (x, y) pairs such that (x, y) are in some relation R.
General Algorithm
Score Matrix
ScoreX,Y =



sx1,y1
· · · sx1,yn
.
.
. ...
.
.
.
sxn,y1
· · · sxn,yn


 (1)
Rescoring
Reverse Rank
reverse rank(xi, yj) = |{xk ∈ X|sxk,yj
≥ sxi,yj
}| (2)
Score RR
scoreRR(xi, yj) =
sxi,yj
reverse rank(xi, yj)
(3)
Forward Rank
forward rank(xi, yj) = |{yk ∈ Y|sxi,yk
≥ sxi,yj
}| (4)
Combining Reverse Rank and Forward Rank
1-Step Approach
scoreRR FR 1step(xi, yj) =
sxi,yj
product
, (5)
where
product =reverse rank(xi, yj)×
forward rank(xi, yj).
(6)
2-Step Approach
scoreRR FR 2step involves first computing the reverse rank and re-adjusting
every score based on the reverse ranks. Then in a second step the new
scores are used to compute forward ranks and then those scores are
adjusted based on the forward ranks.
Maximum Assignment
We used the Hungarian Algorithm to optimize:
max
Z⊆X×Y
(x,y)∈Z
score(x, y)
s.t. (xi, yj) ∈ Z ⇒ (xk, yj) /∈ Z, ∀k = i
(xi, yj) ∈ Z ⇒ (xi, yk) /∈ Z, ∀k = j.
(7)
Computation of Initial Score Matrix
Lemmatization
English - NLTK WordNetLemmatizer [Bird et al., 2009]
French, German, Spanish - TreeTagger [Schmid, 1994]
Word Context Information 2012 Google 5-gram corpus for English, French, German,
and Spanish [Michel et al., 2010]
Frequency Information Computed using the same corpora as Word Context
Information.
Temporal Information Computed using the following corpora:
English - English Gigaword Fifth Edition
French - French Gigaword Fifth Edition
Spanish - Spanish Gigaword Fifth Edition
German - Web crawling and extracting news articles from
http://www.tagesspiegel.de/
Word Burstiness Information Computed using the same corpora as Temporal
Information.
Phonetic Information Computed using a measurement based on Normalized Edit
Distance (NED).
Combining Information Sources For each candidate cognate pair (x, y), its final score
is:
score(x, y) =
m∈metrics
wmscorem(x, y), (8)
where metrics is the set of measurements listed above; wm is the learned weight
for metric m; and scorem(x, y) is the score assigned to the pair (x, y) by metric
m.
Experiments
We used the following language pairs:
French-English
German-English
Spanish-English
We used the following testing conditions:
Small Data (Traditional)
Large Data (New)
We used the following as our baseline:
Initial Score Matrix without any rescoring
Using Global Constraints to Rescore - Results
0 20 40 60 80 100
Recall
0
20
40
60
80
100
Precision
Precision-Recall Curves
Baseline
RR
RR_FR_1step
RR_FR_2step
Max Assignment Max Assignment Score
Figure: Precision-Recall Curves for French-English
(small data)
Method Max F1 11-point IAP
Baseline 54.92 50.99
RR 62.94 59.62
RR FR 1step 68.35 64.42
RR FR 2step 69.72 67.29
Table: French-English Performance (small data)
0 20 40 60 80 100
Recall
0
20
40
60
80
100
Precision
Precision-Recall Curves
Baseline RR RR_FR_1step RR_FR_2step
Figure: Precision-Recall Curves for French-English
(large data)
Method Max F1 11-point IAP
Baseline 55.08 51.35
RR 60.88 58.79
RR FR 1step 65.87 63.55
RR FR 2step 65.76 65.26
Table: French-English Performance (large data)
Conclusion
We presented new methods for rescoring a matrix of initial scores and new large data
testing conditions for evaluation. Our new methods are complementary to existing state
of the art methods, easy to implement, computationally efficient, and effective in
improving performance.
Acknowledgment
We would like to thank TCNJ students Ethan Kochis and Garrett Beatty for their help
with building this poster.
Bibliography
Bird, S., Klein, E., and Loper, E. (2009).
Natural Language Processing with Python.
O’Reilly Media, Inc., 1st edition.
Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Team, T. G. B., Pickett, J. P., Hoiberg, D., Clancy, D., Norvig,
P., Orwant, J., Pinker, S., Nowak, M. A., and Aiden, E. L. (2010).
Quantitative analysis of culture using millions of digitized books.
Science, 331(6014):176–182.
Schmid, H. (1994).
Part-of-speech tagging with neural networks.
In COLING, pages 172–176.
http://www.aclweb.org/anthology/C/C94/C94-1027.pdf.
Bloodgood & Strauss Using Global Constraints and Reranking to Improve Cognates Detection

More Related Content

Similar to Improving Cognates Detection with Global Constraints and Reranking

Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
Second Order Heuristics in ACGP
Second Order Heuristics in ACGPSecond Order Heuristics in ACGP
Second Order Heuristics in ACGPhauschildm
 
Pairwise testing sagar_hadawale
Pairwise  testing sagar_hadawalePairwise  testing sagar_hadawale
Pairwise testing sagar_hadawaleSagar Hadawale
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...Aboul Ella Hassanien
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...AIST
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...Lifeng (Aaron) Han
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingArthur Charpentier
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...AmirParnianifard1
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
Advance mathematics mid term presentation rev01
Advance mathematics mid term presentation rev01Advance mathematics mid term presentation rev01
Advance mathematics mid term presentation rev01Nirmal Joshi
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptxssuser7807522
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Generic Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their OptimizationGeneric Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their Optimizationinfopapers
 
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationDominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationIlya Loshchilov
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxGairuzazmiMGhani
 
An automatic test data generation for data flow
An automatic test data generation for data flowAn automatic test data generation for data flow
An automatic test data generation for data flowWafaQKhan
 
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithmOptimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithminfopapers
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...SSA KPI
 

Similar to Improving Cognates Detection with Global Constraints and Reranking (20)

Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Second Order Heuristics in ACGP
Second Order Heuristics in ACGPSecond Order Heuristics in ACGP
Second Order Heuristics in ACGP
 
Pairwise testing sagar_hadawale
Pairwise  testing sagar_hadawalePairwise  testing sagar_hadawale
Pairwise testing sagar_hadawale
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive Modeling
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
Advance mathematics mid term presentation rev01
Advance mathematics mid term presentation rev01Advance mathematics mid term presentation rev01
Advance mathematics mid term presentation rev01
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Generic Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their OptimizationGeneric Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their Optimization
 
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationDominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptx
 
An automatic test data generation for data flow
An automatic test data generation for data flowAn automatic test data generation for data flow
An automatic test data generation for data flow
 
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithmOptimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...
 

More from Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Recently uploaded

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

Improving Cognates Detection with Global Constraints and Reranking

  • 1. Using Global Constraints and Reranking to Improve Cognates Detection Michael Bloodgood1 Benjamin Strauss2 1 Department of Computer Science The College of New Jersey 2 Computer Science and Engineering Dept. The Ohio State University Introduction Background Cognates detection is the task of identifying words across languages that have a common origin. Cognates are used for protolanguage reconstruction and cross-language dictionary lookup. Cognates can also improve the quality of machine translation, word alignment, and bilingual lexicon induction. Current Solution Create a score matrix for all word pairs based on weighted combinations of component scores. The component scores are computed on the basis of word context information, word frequency information, temporal information, word burstiness information, and phonetic information. Our Contribution We propose a new algorithm for rescoring the matrix by taking into account scores assigned to other word pairs. Precision and recall are improved by large amounts in experiments across three language pairs with both traditional and new large data testing conditions. Motivation Spanish Portuguese · · · cozinhar · · · andar . . . ... caminar .80 (2) .70 (1) .40 (2) .70 (1) .20 .70 . . . ... cocinar .99 (1) .20 (2) .99 (1) .10 (2) .99 .05 Blue - Score from Initial Score Matrix (#) - Reverse Rank Green - Score after Reverse Rank (#) - Forward Rank Black - Score after Forward Rank General Task Let X = {x1, x2, ..., xn}. Let Y = {y1, y2, ..., yn}. Extract (x, y) pairs such that (x, y) are in some relation R. General Algorithm Score Matrix ScoreX,Y =    sx1,y1 · · · sx1,yn . . . ... . . . sxn,y1 · · · sxn,yn    (1) Rescoring Reverse Rank reverse rank(xi, yj) = |{xk ∈ X|sxk,yj ≥ sxi,yj }| (2) Score RR scoreRR(xi, yj) = sxi,yj reverse rank(xi, yj) (3) Forward Rank forward rank(xi, yj) = |{yk ∈ Y|sxi,yk ≥ sxi,yj }| (4) Combining Reverse Rank and Forward Rank 1-Step Approach scoreRR FR 1step(xi, yj) = sxi,yj product , (5) where product =reverse rank(xi, yj)× forward rank(xi, yj). (6) 2-Step Approach scoreRR FR 2step involves first computing the reverse rank and re-adjusting every score based on the reverse ranks. Then in a second step the new scores are used to compute forward ranks and then those scores are adjusted based on the forward ranks. Maximum Assignment We used the Hungarian Algorithm to optimize: max Z⊆X×Y (x,y)∈Z score(x, y) s.t. (xi, yj) ∈ Z ⇒ (xk, yj) /∈ Z, ∀k = i (xi, yj) ∈ Z ⇒ (xi, yk) /∈ Z, ∀k = j. (7) Computation of Initial Score Matrix Lemmatization English - NLTK WordNetLemmatizer [Bird et al., 2009] French, German, Spanish - TreeTagger [Schmid, 1994] Word Context Information 2012 Google 5-gram corpus for English, French, German, and Spanish [Michel et al., 2010] Frequency Information Computed using the same corpora as Word Context Information. Temporal Information Computed using the following corpora: English - English Gigaword Fifth Edition French - French Gigaword Fifth Edition Spanish - Spanish Gigaword Fifth Edition German - Web crawling and extracting news articles from http://www.tagesspiegel.de/ Word Burstiness Information Computed using the same corpora as Temporal Information. Phonetic Information Computed using a measurement based on Normalized Edit Distance (NED). Combining Information Sources For each candidate cognate pair (x, y), its final score is: score(x, y) = m∈metrics wmscorem(x, y), (8) where metrics is the set of measurements listed above; wm is the learned weight for metric m; and scorem(x, y) is the score assigned to the pair (x, y) by metric m. Experiments We used the following language pairs: French-English German-English Spanish-English We used the following testing conditions: Small Data (Traditional) Large Data (New) We used the following as our baseline: Initial Score Matrix without any rescoring Using Global Constraints to Rescore - Results 0 20 40 60 80 100 Recall 0 20 40 60 80 100 Precision Precision-Recall Curves Baseline RR RR_FR_1step RR_FR_2step Max Assignment Max Assignment Score Figure: Precision-Recall Curves for French-English (small data) Method Max F1 11-point IAP Baseline 54.92 50.99 RR 62.94 59.62 RR FR 1step 68.35 64.42 RR FR 2step 69.72 67.29 Table: French-English Performance (small data) 0 20 40 60 80 100 Recall 0 20 40 60 80 100 Precision Precision-Recall Curves Baseline RR RR_FR_1step RR_FR_2step Figure: Precision-Recall Curves for French-English (large data) Method Max F1 11-point IAP Baseline 55.08 51.35 RR 60.88 58.79 RR FR 1step 65.87 63.55 RR FR 2step 65.76 65.26 Table: French-English Performance (large data) Conclusion We presented new methods for rescoring a matrix of initial scores and new large data testing conditions for evaluation. Our new methods are complementary to existing state of the art methods, easy to implement, computationally efficient, and effective in improving performance. Acknowledgment We would like to thank TCNJ students Ethan Kochis and Garrett Beatty for their help with building this poster. Bibliography Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc., 1st edition. Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Team, T. G. B., Pickett, J. P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. A., and Aiden, E. L. (2010). Quantitative analysis of culture using millions of digitized books. Science, 331(6014):176–182. Schmid, H. (1994). Part-of-speech tagging with neural networks. In COLING, pages 172–176. http://www.aclweb.org/anthology/C/C94/C94-1027.pdf. Bloodgood & Strauss Using Global Constraints and Reranking to Improve Cognates Detection