Search in the Biblical Domain - BibleTech: 2011

  • 1,046 views
Uploaded on

Covers techniques for searching the Bible using multiple translations and searching extra-biblical content like commentaries and journals.

Covers techniques for searching the Bible using multiple translations and searching extra-biblical content like commentaries and journals.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,046
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Search in the Biblical Domain Brian Seagraves (Bible.org)
  • 2. What is “Search”?
  • 3. What is “Search”?• Information/Document Retrieval
  • 4. What is “Search”?• Information/Document Retrieval• Basic Definition:
  • 5. What is “Search”?• Information/Document Retrieval• Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.
  • 6. What is “Search”?• Information/Document Retrieval• Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.• Advanced Definition:
  • 7. What is “Search”?• Information/Document Retrieval• Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.• Advanced Definition: • Finding relevant content for some query by understanding the contextual meaning of terms in the search index and query.
  • 8. What is “Search”?• Information/Document Retrieval• Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.• Advanced Definition: • Finding relevant content for some query by understanding the contextual meaning of terms in the search index and query. • Semantic Search
  • 9. Types and Sources of Content
  • 10. Types and Sources of Content• The Bible and its verses
  • 11. Types and Sources of Content• The Bible and its verses• Articles, Journals, and other extra-biblical content
  • 12. Types and Sources of Content• The Bible and its verses• Articles, Journals, and other extra-biblical content• The web
  • 13. Information Retrieval Engines
  • 14. Information Retrieval Engines• Sphinx - http://sphinxsearch.com
  • 15. Information Retrieval Engines• Sphinx - http://sphinxsearch.com• Lucene - http://lucene.apache.org/
  • 16. Information Retrieval Engines• Sphinx - http://sphinxsearch.com• Lucene - http://lucene.apache.org/ • Solr - http://lucene.apache.org/solr/
  • 17. Information Retrieval Engines• Sphinx - http://sphinxsearch.com• Lucene - http://lucene.apache.org/ • Solr - http://lucene.apache.org/solr/• MySQL Fulltext Search - kinda
  • 18. Solr
  • 19. Solr• Open Source
  • 20. Solr• Open Source• Full-text search
  • 21. Solr• Open Source• Full-text search• Hit Highlighting
  • 22. Solr• Open Source• Full-text search• Hit Highlighting• Facets
  • 23. Solr• Open Source• Full-text search• Hit Highlighting• Facets• Java
  • 24. Solr• Open Source• Full-text search• Hit Highlighting• Facets• Java• REST-like HTTP/XML and JSON APIs
  • 25. Solr Documents
  • 26. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved
  • 27. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved • Bible Verse
  • 28. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article
  • 29. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article • Commentary Chapter/Section
  • 30. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article • Commentary Chapter/Section • Web Page
  • 31. Solr Documents
  • 32. Solr Documents• Documents have one or more Fields
  • 33. Solr Documents• Documents have one or more Fields• Fields Have types
  • 34. Solr Documents• Documents have one or more Fields• Fields Have types • Integer
  • 35. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float
  • 36. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float • String
  • 37. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float • String • Text
  • 38. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float • String • Text • Date
  • 39. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float • String • Text • Date • and More!
  • 40. Solr Fields
  • 41. Solr Fields• Field Types can have:
  • 42. Solr Fields• Field Types can have: • Filters
  • 43. Solr Fields• Field Types can have: • Filters • Remove parts of the content
  • 44. Solr Fields• Field Types can have: • Filters • Remove parts of the content • Tokenizers
  • 45. Solr Fields• Field Types can have: • Filters • Remove parts of the content • Tokenizers • Split content into chunks/tokens
  • 46. Solr Fields
  • 47. Solr Fields• The “String” Field Type
  • 48. Solr Fields• The “String” Field Type• <fieldType name="string" class="solr.StrField" />
  • 49. Solr Fields• The “String” Field Type• <fieldType name="string" class="solr.StrField" />• No Filter; No Tokenizer
  • 50. Solr Fields• The “String” Field Type• <fieldType name="string" class="solr.StrField" />• No Filter; No Tokenizer • Field content won’t be split or changed
  • 51. <fieldtype name="html_text" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" /> <filter class="solr.StopFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer></fieldtype>
  • 52. Sample Schema (cont.)<fieldtype name="sint" class="solr.SortableIntField" omitNorms="true" /><fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
  • 53. Sample Schema (cont.)<fields> <field name="id" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="abbr" type="string" indexed="true" stored="true" multiValued="false" /> <field name="name" type="string" indexed="true" stored="true" multiValued="false" /> <field name="book" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="chapter" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="verse" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="ot_nt" type="string" indexed="true" stored="true" multiValued="false" /> <field name="net" type="text" indexed="false" stored="true" multiValued="false" /> <field name="all_index" type="html_text" indexed="true" stored="false" /></fields><copyField source="net" dest="all_index" /><uniqueKey>id</uniqueKey><defaultSearchField>all_index</defaultSearchField><solrQueryParser defaultOperator="OR" />
  • 54. Put Data in Solr
  • 55. Put Data in Solr• Remember, Solr communicates using XML over HTTP
  • 56. Put Data in Solr• Remember, Solr communicates using XML over HTTP• No concept of updating a document - delete, then add
  • 57. Put Data in Solr• Remember, Solr communicates using XML over HTTP• No concept of updating a document - delete, then add• To add, POST XML to update handler
  • 58. Put Data in Solr• Remember, Solr communicates using XML over HTTP• No concept of updating a document - delete, then add• To add, POST XML to update handler • http://localhost:8080/solr/bible/update
  • 59. Add XML<add> <doc> <id>1</id> <net>In the beginning God created the heavens and the earth.</net> </doc></add>
  • 60. PHP API• No XML!• $client = new SolrClient($options); $doc = new SolrInputDocument(); $doc->addField(id, 1); //Must be Integer $doc->addField(net, ‘In the beginning God created the heavens and the earth.’); $client->addDocument($doc);
  • 61. Querying Solr
  • 62. Querying Solr• HTTP GET Request
  • 63. Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god
  • 64. Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god• | Path to Solr ||Core||Handler||Query |
  • 65. Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god• | Path to Solr ||Core||Handler||Query |• Returns XML By Default
  • 66. Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god• | Path to Solr ||Core||Handler||Query |• Returns XML By Default• Can return JSON and more
  • 67. Querying Solr
  • 68. Querying Solr• Queries the defaultSearchField by default
  • 69. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>
  • 70. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>• Can query other fields by using the syntax:field:value
  • 71. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>• Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974
  • 72. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>• Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974• Multiple queries / Booleans
  • 73. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>• Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974• Multiple queries / Booleans • http://localhost:8080/solr/bible3/select?q=god AND book:40
  • 74. Search MultipleTranslations (Fields)
  • 75. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index
  • 76. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index• Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />
  • 77. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index• Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />• Query: “Shew Thyself”
  • 78. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index• Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />• Query: “Shew Thyself” • 0 Results in the NET http://localhost:8080/solr/bible3/select?q=shew%20theyself
  • 79. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index• Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />• Query: “Shew Thyself” • 0 Results in the NET http://localhost:8080/solr/bible3/select?q=shew%20theyself • 360 Results in the Combined index/field http://localhost:8080/solr/bible4/select?q=shew%20theyself
  • 80. Search Multiple Translations
  • 81. Search Multiple Translations• + Quasi Synonym term/phrase injection
  • 82. Search Multiple Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads to stronger possible matches
  • 83. Search Multiple Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads to stronger possible matches• + Matches verses when the source translation isn’t known
  • 84. Search Multiple Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads to stronger possible matches• + Matches verses when the source translation isn’t known• - No control over which translation gets more weight
  • 85. Search Multiple Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads to stronger possible matches• + Matches verses when the source translation isn’t known• - No control over which translation gets more weight• - No control over scoring of matches
  • 86. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.• net_index^6 kjv_index^.5• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^6%20kjv_index^.5&fl=score
  • 87. Scoring
  • 88. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q
  • 89. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors
  • 90. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better)
  • 91. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better)
  • 92. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better)
  • 93. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35
  • 94. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35 • http://localhost:8080/solr/bible3/select?q=wept
  • 95. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35 • http://localhost:8080/solr/bible3/select?q=wept• http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/ Similarity.html
  • 96. Search Multiple Translations
  • 97. Search Multiple Translations• Another way: Dismax
  • 98. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.
  • 99. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1
  • 100. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights
  • 101. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.
  • 102. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.• net_index^6 kjv_index^.5
  • 103. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.• net_index^6 kjv_index^.5• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score
  • 104. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.• net_index^6 kjv_index^.5• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^6%20kjv_index^.5&fl=score
  • 105. Topic Tagging
  • 106. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses
  • 107. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses• Helpful for “theme” based queries.
  • 108. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses• Helpful for “theme” based queries. • “Social Justice” - no good matches
  • 109. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses• Helpful for “theme” based queries. • “Social Justice” - no good matches • “Satan” - Many Names
  • 110. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses• Helpful for “theme” based queries. • “Social Justice” - no good matches • “Satan” - Many Names • Name Tagging in general can be very helpful
  • 111. Searching Strong’s
  • 112. Searching Strong’s• Add a field for Strong’s: strongs_index
  • 113. Searching Strong’s• Add a field for Strong’s: strongs_index• 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198
  • 114. Searching Strong’s• Add a field for Strong’s: strongs_index• 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198• Most of the benefits of text searching
  • 115. Searching Strong’s• Add a field for Strong’s: strongs_index• 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198• Most of the benefits of text searching • “Word” frequency
  • 116. Searching Strong’s• Add a field for Strong’s: strongs_index• 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198• Most of the benefits of text searching • “Word” frequency • Document vs. corpus frequency of search terms
  • 117. Searching Articles
  • 118. Searching Articles• Similar approach to text-based queries
  • 119. Searching Articles• Similar approach to text-based queries • Stem words
  • 120. Searching Articles• Similar approach to text-based queries • Stem words • Use Synonyms
  • 121. Searching Articles• Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words
  • 122. Searching Articles• Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words• Without manual tagging, there’s no automatic way to index/search by Bible Reference
  • 123. Searching Articles
  • 124. Searching Articles• Article contains reference: “John 3”
  • 125. Searching Articles• Article contains reference: “John 3”• User searches for “John 3:16” or “John 2-4”
  • 126. Searching Articles• Article contains reference: “John 3”• User searches for “John 3:16” or “John 2-4”• Results: no meaningful matches at best (unless the documents match the query “John”
  • 127. Searching Articles
  • 128. Searching Articles• Solr-based Solutions:
  • 129. Searching Articles• Solr-based Solutions: • Identify and index references and their composite verses using a grammar.
  • 130. Searching Articles• Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3
  • 131. Searching Articles• Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3 • Store in a multivalued field - each reference is a “term”
  • 132. Searching Articles• Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3 • Store in a multivalued field - each reference is a “term” • Must also parse and expand references in queries in order to match
  • 133. Searching Articles
  • 134. Searching Articles• Relational database-based solution:
  • 135. Searching Articles• Relational database-based solution: • Assign an id to every verse
  • 136. Searching Articles• Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId
  • 137. Searching Articles• Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids.
  • 138. Searching Articles• Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids. • SELECT COUNT(id) WHERE verseId IN (ID_LIST) GROUP BY articleId
  • 139. Searching Articles• Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids. • SELECT COUNT(id) WHERE verseId IN (ID_LIST) GROUP BY articleId • Higher count -> Article is most likely to me more about that reference than other articles with a lower count
  • 140. Searching Articles
  • 141. Searching Articles• Relational database-based solution:
  • 142. Searching Articles• Relational database-based solution: • Large amount of rows.
  • 143. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences)
  • 144. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count
  • 145. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId.
  • 146. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId. • Negligibly faster.
  • 147. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId. • Negligibly faster. • Only approx. 3,000,000 rows
  • 148. Heterogeneous Indexes
  • 149. Heterogeneous Indexes• All content is not created equally.
  • 150. Heterogeneous Indexes• All content is not created equally.• Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one
  • 151. Heterogeneous Indexes• All content is not created equally.• Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal
  • 152. Heterogeneous Indexes• All content is not created equally.• Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal• Apply a field or document boost to help normalize results
  • 153. Heterogeneous Indexes• All content is not created equally.• Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal• Apply a field or document boost to help normalize results• Some content gets bumped up and some down