Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

Simple effective decipherment via combinatorial optimization

100

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
100
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
1
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript

• 1. Simple Effective Decipherment via Combinatorial Optimization Taylor Berg-Kirkpatrick and Dan Klein Computer Science Division University of California at Berkeley {tberg, klein}@cs.berkeley.edu Abstract 2011), and bilingual lexicon induction (Koehn and Knight, 2002; Haghighi et al., 2008). We consider We present a simple objective function that a common element, which is a model wherein there when optimized yields accurate solutions to are character-level correspondences and word-level both decipherment and cognate pair identiﬁca- correspondences, with the word matching parame- tion problems. The objective simultaneously terized by the character one. This approach sub- scores a matching between two alphabets and a matching between two lexicons, each in a sumes a range of past tasks, though of course past different language. We introduce a simple work has specialized in interesting ways. coordinate descent procedure that efﬁciently Past work has emphasized the modeling as- ﬁnds effective solutions to the resulting com- pect, where here we use a parametrically simplistic binatorial optimization problem. Our system model, but instead emphasize inference. requires only a list of words in both languages as input, yet it competes with and surpasses 2 Decipherment as Two-Level several state-of-the-art systems that are both substantially more complex and make use of Optimization more information. Our method represents two matchings, one at the al- phabet level and one at the lexicon level. A vector of1 Introduction variables x speciﬁes a matching between alphabets. For each character i in the source alphabet and eachDecipherment induces a correspondence between character j in the target alphabet we deﬁne an indi-the words in an unknown language and the words cator variable xij that is on if and only if character iin a known language. We focus on the setting where is mapped to character j. Similarly, a vector y rep-a close correspondence between the alphabets of the resents a matching between lexicons. For word u intwo languages exists, but is unknown. Given only the source lexicon and word v in the target lexicon,two lists of words, the lexicons of both languages, the indicator variable yuv denotes that u maps to v.we attempt to induce the correspondence between Note that the matchings need not be one-to-one.alphabets and identify the cognates pairs present in We deﬁne an objective function on the matchingthe lexicons. The system we propose accomplishes variables as follows. Let E DIT D IST(u, v; x) denotethis by deﬁning a simple combinatorial optimiza- the edit distance between source word u and targettion problem that is a function of both the alphabet word v given alphabet matching x. Let the lengthand cognate matchings, and then induces correspon- of word u be lu and the length of word w be lw .dences by optimizing the objective using a block co- This edit distance depends on x in the followingordinate descent procedure. way. Insertions and deletions always cost a constant There is a range of past work that has var- .1 Substitutions also cost unless the charactersiously investigated cognate detection (Kondrak, are matched in x, in which case the substitution is2001; Bouchard-Cˆ t´ et al., 2007; Bouchard-Cˆ t´ oe oe 1 In practice we set = lu +lv . lu + lv is the maximum 1et al., 2009; Hall and Klein, 2010), character-level number of edit operations between words u and v. This nor-decipherment (Knight and Yamada, 1999; Knight malization insures that edit distances are between 0 and 1 foret al., 2006; Snyder et al., 2010; Ravi and Knight, all pairs of words. 313 Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 313–321, Edinburgh, Scotland, UK, July 27–31, 2011. c 2011 Association for Computational Linguistics
• 2. free. Now, the objective that we will minimize canbe stated simply: u v yuv · E DIT D IST(u, v; x), In order to get a better handle on the shape of thethe sum of the edit distances between the matched objective and to develop an efﬁcient optimizationwords, where the edit distance function is parame- procedure we decompose each edit distance compu-terized by the alphabet matching. tation and re-formulate the optimization problem in Without restrictions on the matchings x and y Section 2.2.this objective can always be driven to zero by eithermapping all characters to all characters, or matchingnone of the words. It is thus necessary to restrict 2.1 Examplethe matchings in some way. Let I be the size of Figure 1 presents both an example matching prob-the source alphabet and J be the size of the target lem and a diagram of the variables and objective.alphabet. We allow the alphabet matching x tobe many-to-many but require that each character Here, the source lexicon consists of the Englishparticipate in no more than two mappings and that words (cat, bat, cart, rat, cab), andthe total number of mappings be max(I, J), a the source alphabet consists of the characters (a,constraint we refer to as restricted-many-to-many. b, c, r, t). The target alphabet is (0, 1,The requirements can be encoded with the followinglinear constraints on x: 2, 3). We have used digits as symbols in the target alphabet to make it clear that we treat the alphabets ∀i xij ≤ 2 as disjoint. We have no prior knowledge about any j correspondence between alphabets, or between lexi- ∀j xij ≤ 2 cons. i The target lexicon consists of the words (23, xij = max(I, J) 1233, 120, 323, 023). The bipartite graphs i j show a speciﬁc setting of the matching variables.The lexicon matching y is required to be τ -one-to- The bold edges correspond to the xij and yuv thatone. By this we mean that y is an at-most-one-to-one are one. The matchings shown achieve an edit dis-matching that covers proportion τ of the smaller of tance of zero between all matched word pairs ex-the two lexicons. Let U be the size of the source cept for the pair (cat, 23). The best edit align-lexicon and V be this size of the target lexicon.This requirement can be encoded with the following ment for this pair is also diagrammed. Here, ‘a’linear constraints: is aligned to ‘2’, ‘t’ is aligned to ‘3’, and ‘c’ is deleted and therefore aligned to the null position ‘#’. ∀u yuv ≤ 1 Only the initial deletion has a non-zero cost since v all other alignments correspond to substitutions be- ∀v yuv ≤ 1 tween characters that are matched in x. u yuv = τ min(U, V ) 2.2 Explicit Objective u v Computing E DIT D IST(u, v; x) requires running a Now we are ready to deﬁne the full optimization dynamic program because of the unknown editproblem. The ﬁrst formulation is called the Implicit alignments; here we deﬁne those alignments z ex-Matching Objective since includes an implicit plicitly, which makes the E DIT D IST(u, v; x) easy tominimization over edit alignments inside the com- write explicitly at the cost of more variables. How-putation of E DIT D IST. ever, by writing the objective in an explicit form that refers to these edit variables, we are able to describe(1) Implicit Matching Objective: a efﬁcient block coordinate descent procedure that can be used for optimization. min yuv · E DIT D IST(u, v; x) E DIT D IST(u, v; x) is computed by minimizing x,y u v over the set of monotonic alignments between the s.t. x is restricted-many-to-many characters of the source word u and the characters of the target word v. Let un be the character at the y is τ -one-to-one nth position of the source word u, and similarly for 314
• 3. Alphabet Matching Matching Problem a 0 b 1 min yuv · EditDist(u, v; x) x,y u v c 2 s.t. x is restricted-many-to-many r 3 y is τ -one-to-one t xij Lexicon Matching Edit Distance cat 23 # # EditDist(u, v; x) = bat 1233 Substitution c 2 cart 120 minzuv · (1 − xun vm )zuv,nm a 3 s.t. n m rat 323 zuv is monotonic + zuv,n# + zuv,#m t cab 023 zuv,nm n m yuv Deletion InsertionFigure 1: An example problem displaying source and target lexicons and alphabets, along with speciﬁc matchings.The variables involved in the optimization problem are diagrammed. x are the alphabet matching indicator variables,y are the lexicon matching indicator variables, and z are the edit alignment indicator variables. The index u refers toa word in the source lexicon, v refers to word in the target lexicon, i refers to a character in the source alphabet, andj refers to a character in the target alphabet. n and m refer to positions in source and target words respectively. Thematching objective function is also shown.vm . Let zuv be the vector of alignment variablesfor the edit distance computation between sourceword u and target word v, where entry zuv,nm S UB(zuv , x) = (1 − xun vm )zuv,nmindicates whether the character at position n of n,msource word u is aligned to the character at positionm of target word v. Additionally, deﬁne variables D EL(zuv ) = zuv,n#zuv,n# and zuv,#m denoting null alignments, which nwill be used to keep track of insertions and deletions. I NS(zuv ) = zuv,#m m E DIT D IST(u, v; x) = Notice that the variable zuv,nm being turned on in- min · S UB(zuv , x) + D EL(zuv ) + I NS(zuv ) dicates the substitute operation, while a zuv,n# or zuv zuv,#m being turned on indicates an insert or delete s.t. zuv is monotonic operation. These variables are digrammed in Fig- ure 1. The requirement that zuv be a monotonic alignment can be expressed using linear constraints, but in our optimization procedure (described in Sec-We deﬁne S UB(zuv , x) to be the number of sub- tion 3) these constraints need not be explicitly rep-stitutions between characters that are not matched resented.in x, D EL(zuv ) to be the number of deletions, and Now we can substitute the explicit edit distanceI NS(zuv ) to be the number of insertions. equation into the implicit matching objective (1). 315
• 4. Noticing that the mins and sums commute, we arrive Algorithm 1 Block Coordinate Descentat the explicit form of the matching optimization Randomly initialize alphabet matching x.problem. repeat for all u, v do(2) Explicit Matching Objective: (euv , zuv ) ← E DIT D IST(u, v; x) end formin yuv · · S UB(zuv , x) + D EL(zuv ) + I NS(zuv ) [Hungarian]x,y,z u,v y ← argminy τ -one-to-one u,v yuv euv s.t. x is restricted-many-to-many [Solve LP] y is τ -one-to-one x ← argmaxx restr.-many-to-many i,j xij cij ∀uv zuv is monotonic until convergenceThe implicit and explicit optimizations are the same,apart from the fact that the explicit optimization now Notice that y simply picks out which edit distanceexplicitly represents the edit alignment variables z. problems affect the objective. The zuv in each ofLet the explicit matching objective (2) be denoted these edit distance problems can be optimized in-as J(x, y, z). The relaxation of the explicit problem dependently. zuv that do not have yuv active havewith 0-1 constraints removed has integer solutions,2 no effect on the objective, and zuv with yuv activehowever the objective J(x, y, z) is non-convex. We can be optimized using the standard edit distance dy-thus turn to a block coordinate descent method in the namic program. Thus, in a ﬁrst step we compute thenext section in order to ﬁnd local optima. U · V edit distances euv and best monotonic align- ment variables zuv between all pairs of source and3 Optimization Method target words using U ·V calls to the standard edit dis- tance dynamic program. Altogether, this takes timeWe now state a block coordinate descent procedure O ( u lu ) · ( v lv ) .to ﬁnd local optima of J(x, y, z) under the con- Now, in a second step we compute the leaststraints on x, y, and z. This procedure alternates weighted τ -one-to-one matching y under thebetween updating y and z to their exact joint optima weights euv . This can be accomplished in timewhen x is held ﬁxed, and updating x to its exact op- O(max(U, V )3 ) using the Hungarian algorithmtimum when y and z are held ﬁxed. (Kuhn, 1955). These two steps produce y and z that The psuedocode for the procedure is given in Al- exactly achieve the optimum value of J(x, y, z) forgorithm 1. Note that the function E DIT D IST returns the given value of x.both the min edit distance euv and the argmin editalignments zuv . Also note that cij is as deﬁned in 3.2 Alphabet Matching UpdateSection 3.2. Let y and z, the lexicon matching variables and the3.1 Lexicon Matching Update edit alignments, be ﬁxed. Now, we ﬁnd the optimal alphabet matching variables x subject to the con-Let x, the alphabet matching variable, be ﬁxed. We straint that x is restricted-many-to-many.consider the problem of optimizing J(x, y, z) over It makes sense that to optimize J(x, y, z) with re-the lexicon matching variable y and and the edit spect to x we should prioritize mappings xij thatalignments z under the constraint that y is τ -one- would mitigate the largest substitution costs in theto-one and each zuv is monotonic. active edit distance problems. Indeed, with a little 2 This can be shown by observing that optimizing x when y algebra it can be shown that solving a maximumand z are held ﬁxed yields integer solutions (shown in Section weighted matching problem with weights cij that3.2), and similarly for the optimization of y and z when x isﬁxed (shown in Section 3.1). Thus, every local optimum with count potential substitution costs gives the correctrespect to these block coordinate updates has integer solutions. update for x. In particular, cij is the total cost ofThe global optimum must be one of these local optima. substitution edits in the active edit alignment prob- 316