2. Bible Software: bread and butter
• Library of Bibles and Bible reference resources
• Links among resources
• Search
Drayton Benner | Miklal Software Solutions | DraytonBenner@MiklalSoftware.com
3. Boolean Text Search: goals
• Functionality
• Powerful
• Flexible
• Enabling biblical researchers as well as laypersons
• Not resource intensive
• Fast
• Low memory usage
• Handle old devices
• Handle server requests without a server farm
• Scale to large libraries
Drayton Benner | Miklal Software Solutions | DraytonBenner@MiklalSoftware.com
4. Boolean Text Search: functionality
• Multilingual
• Hebrew, Aramaic, and Greek
• Bible translations in any language
• Surface forms and lemmas
• AND, OR searches
• (Jesus or Christ) and lord
• Ordered or not ordered
• Proximity for AND searches
• [within 3 verses and (1 chapter or 1 paragraph)]
• Imprecise words
• Wildcards: Abra*m, wom?n
• Regular expressions
Drayton Benner | Miklal Software Solutions | DraytonBenner@MiklalSoftware.com
8. Inverted Index Compression
conquer 698; 19,203; 54,897; 246,488; 565,574;
645,106; 802,193
conquering 89,654; 480,737; 749,240
conqueror 513,238
conscience 112,592; 118,761; 192,708; 223,519;
229,955; 326,459; 331,294; 372,501;
418,133; 436,530; 456,026; 458,388;
508,750; 510,470; 538,332; 547,994;
561,011; 564,817; 595,180; 595,757;
643,096; 668,186; 677,311; 781,107;
797,371; 808,857; 810,062
Drayton Benner | Miklal Software Solutions | DraytonBenner@MiklalSoftware.com
• Store the first number, then
difference between each number
and the previous number
• Use compression that works well
for small numbers
9. Compression for Small Integers
Drayton Benner | Miklal Software Solutions | DraytonBenner@MiklalSoftware.com
• Fixed bit width
• Delta code
• Use gamma code for
length
• Offset as with delta
code
• Golomb code
• Parameterized on
token frequency
• Achieves close to
optimal code length
Table from Introduction to Information Retrieval by Christopher D. Manning et al.
10. Lexicon Retrieval from Disk
Drayton Benner | Miklal Software Solutions | DraytonBenner@MiklalSoftware.com
Word Same as previous Remaining
conquer 0 conquer
conquering 7 ing
conqueror 7 or
conscience 3 science
consciousness 6 ousness
consecrate 4 ecrate
consent 5 nt
14. book
chapter chapter
verseverse
token token token token token token
verse
token token token
verse
token token
Proximity: multiple hierarchies
word word word word word wordwordwordword
• Hierarchies need not be ordered identically
• Allows for multiple orders of books (e.g. Protestant canon, Hebrew Bible, etc.)
15. Demos
• 2nd generation e-ink Kindle
• KJV surface forms search
• Entire app given 2MB memory max
• Java
• Old Android tablet
• Greek lemmas search
• Java
• Laptop
• ESV surface forms search
• C++
Drayton Benner | Miklal Software Solutions | DraytonBenner@MiklalSoftware.com