1. Andi Wu
Asia Bible Society
From Identical StringsFrom Identical StringsFrom Identical StringsFrom Identical Strings
to Similar Stringsto Similar Stringsto Similar Stringsto Similar Strings
Intelligent Search of Biblical Texts Based onIntelligent Search of Biblical Texts Based onIntelligent Search of Biblical Texts Based onIntelligent Search of Biblical Texts Based on
Syntax and SemanticsSyntax and SemanticsSyntax and SemanticsSyntax and Semantics
2. Original Motivation
Systematic approach to Bible translation
To make the translation consistent,
translators need to know not only the
phrases that are identical but phrases that
are not identical but similar in meaning.
Asia Bible Society 2
3. 亚洲圣经协会
Traditional Search:
Based on matches in form
Same words
Same word orders
Intelligent Search:
Based on matches in meaning
Words can be different
Word orders can be different
Identical Strings vs. Similar StringsIdentical Strings vs. Similar StringsIdentical Strings vs. Similar StringsIdentical Strings vs. Similar Strings
5. 亚洲圣经协会
Example of Similar Strings:Example of Similar Strings:Example of Similar Strings:Example of Similar Strings:
Same words in different orders
Jeremiah 2:1
Ezekiel 24:20
6. 亚洲圣经协会
Example of Similar Strings:Example of Similar Strings:Example of Similar Strings:Example of Similar Strings:
Different words in different orders
Proverbs 1:7
Psalms 111:10
7. Similar Strings
Strings that are similar in meaning
Similar words in similar syntactic
relationships
Need in Bible translation
Asia Bible Society 7
8. The importance of Syntactic Relations
Similar strings != strings containing similar words
The same words in different syntactic relations can
mean very different things
An old man with a dog chased a young lady with an umbrella.
vs.
An old lady with a dog chased a young man with an umbrella.
Asia Bible Society 8
9. Semantic Units of Sentences
Triples: dependency relationships between
two words
e.g. In the beginning God created the heavens
and the earth.
God – create ( subject-verb)
create – heavens (verb-object)
create – earth (verb-object)
create – in the beginning (verb-adverbial)
heavens – earth (conjunction).
Asia Bible Society 9
10. Different Strings With the Same Triples
God created the heavens and the earth.
The heavens and the earth were created by God.
God created the heavens and He created the earth.
It is God who created the heavens and the earth.
God – create ( subject-verb)
create – heavens (verb-object)
create – earth (verb-object)
heavens – earth (conjunction).
Asia Bible Society 10
11. Different Strings With Similar Triples
God created man in his own image.
Adam is the man that God created.
Man was created by God on the sixth day.
I am a man created by God.
Triples in common:
God – create ( subject-verb)
create – man (verb-object)
Asia Bible Society 11
12. Similar Triples With Different Words
His troops were annihilated.
His army was destroyed.
His forces were wiped out.
annihilate troops
destroy army (verb-object)
wipe-out forces
Asia Bible Society 12
13. Data Requirement
To recognize similar strings in Biblical texts,
we need
Syntactic analysis of the original Hebrew
and Greek texts
Synonym database of Hebrew and Greek
Both of them have already been developed
at Asia Bible Society
Asia Bible Society 13
16. Triples
Extracted from the trees
Strings for comparison:
Text covered by each node/subtree
Similar strings:
Subtrees containing similar triples
Asia Bible Society 16
19. Compute Similarities Between Subtrees
Semantic space of a subtree:
The set of triples (including their synonymous
expansions) contained in the subtree
Similar subtrees
Subtrees whose semantic spaces overlap
(set intersection)
Degree of similarity
Set Intersection / Set Union
Asia Bible Society 19
20. Semantic Distance
= log ( Intersection / Union ) * -1
Set A = { a, b, c } Set B = { b, c, d, e }
Intersection = { b, c }
Union = { a, b, c, d, e }
Distance(A,B) = log(2/5)* -1 = 0.9162907318742
Set C = { a, b, c, d } Set D = { c, e, f, g, h }
Intersection = { c }
Union = { a, b, c, d, e, f, g, h }
Distance(C,D) = log(1/8)* -1 = 2.0794415416798
Asia Bible Society 20
24. Asia Bible Society 24
Semantic Space of Psalms 14:12
= { repay~person(V-O), as~deed(P-O),deed~him(Poss),
repay~as(V-PP)}
Semantic Space of Psalms 62:1
= { reward~everyone(V-O), as~deed(P-O),deed~him(Poss),
reward~as(V-PP), you~reward(S-V)}
Intersection = { repay/reward~person/everyone(V-O),
as~deed(P-O),deed~him(Poss), repay/reward~as(V-PP)}
Union = {repay/reward~person/everyone(V-O), as~deed(P-
O),deed~him(Poss),repay/reward~as(V-PP),you~reward2(S-V) }
25. The computation
Pair-wise comparison of all phrases
Keep pairs with semantic distance < 9.0
1,607,721 in the database
More than 24 hours on a single machine
for the computation
Asia Bible Society 25
27. Linking OT and NT
Hebrew OT Septuagint Greek NT
Automatic alignment
Strong number matching
Greek Strong numbers for all words in OT which
occur in NT
Match based on Greek Strong numbers
Asia Bible Society 27
29. Search in Bible translations
Alignment between translations and original
texts
Queries in other languages queries in
Hebrew/Greek
Search always done in Hebrew/Greek
Asia Bible Society 29
30. Further Improvements
The results will be better if
All the references are annotated
Better alignment between the Hebrew OT
and Septuagint
Asia Bible Society 30
31. Conclusion
Rich linguistic knowledge (syntactic and
semantic knowledge) enables us to
compare linguistic units on the basis of
meaning rather than form, thus making
the search of Biblical texts more
intelligent.
Asia Bible Society 31