Your SlideShare is downloading. ×
Search in the Biblical      Domain    Brian Seagraves (Bible.org)
What is “Search”?
What is “Search”?•   Information/Document Retrieval
What is “Search”?•   Information/Document Retrieval•   Basic Definition:
What is “Search”?•   Information/Document Retrieval•   Basic Definition:    •   Finding previously seen documents that are ...
What is “Search”?•   Information/Document Retrieval•   Basic Definition:    •   Finding previously seen documents that are ...
What is “Search”?•   Information/Document Retrieval•   Basic Definition:    •   Finding previously seen documents that are ...
What is “Search”?•   Information/Document Retrieval•   Basic Definition:    •   Finding previously seen documents that are ...
Types and Sources of      Content
Types and Sources of       Content• The Bible and its verses
Types and Sources of       Content• The Bible and its verses• Articles, Journals, and other extra-biblical  content
Types and Sources of       Content• The Bible and its verses• Articles, Journals, and other extra-biblical  content• The web
Information Retrieval      Engines
Information Retrieval       Engines• Sphinx - http://sphinxsearch.com
Information Retrieval       Engines• Sphinx - http://sphinxsearch.com• Lucene - http://lucene.apache.org/
Information Retrieval       Engines• Sphinx - http://sphinxsearch.com• Lucene - http://lucene.apache.org/ • Solr - http://...
Information Retrieval       Engines• Sphinx - http://sphinxsearch.com• Lucene - http://lucene.apache.org/ • Solr - http://...
Solr
Solr• Open Source
Solr• Open Source• Full-text search
Solr• Open Source• Full-text search• Hit Highlighting
Solr• Open Source• Full-text search• Hit Highlighting• Facets
Solr• Open Source• Full-text search• Hit Highlighting• Facets• Java
Solr• Open Source• Full-text search• Hit Highlighting• Facets• Java• REST-like HTTP/XML and JSON APIs
Solr Documents
Solr Documents• A document represents a distinct piece of  content that can be stored/retrieved
Solr Documents• A document represents a distinct piece of  content that can be stored/retrieved • Bible Verse
Solr Documents• A document represents a distinct piece of  content that can be stored/retrieved • Bible Verse • Journal Ar...
Solr Documents• A document represents a distinct piece of  content that can be stored/retrieved • Bible Verse • Journal Ar...
Solr Documents• A document represents a distinct piece of  content that can be stored/retrieved • Bible Verse • Journal Ar...
Solr Documents
Solr Documents•   Documents have one or more Fields
Solr Documents•   Documents have one or more Fields•   Fields Have types
Solr Documents•   Documents have one or more Fields•   Fields Have types     •   Integer
Solr Documents•   Documents have one or more Fields•   Fields Have types     •   Integer     •   Float
Solr Documents•   Documents have one or more Fields•   Fields Have types     •   Integer     •   Float     •   String
Solr Documents•   Documents have one or more Fields•   Fields Have types     •   Integer     •   Float     •   String     ...
Solr Documents•   Documents have one or more Fields•   Fields Have types     •   Integer     •   Float     •   String     ...
Solr Documents•   Documents have one or more Fields•   Fields Have types     •   Integer     •   Float     •   String     ...
Solr Fields
Solr Fields• Field Types can have:
Solr Fields• Field Types can have: • Filters
Solr Fields• Field Types can have: • Filters    • Remove parts of the content
Solr Fields• Field Types can have: • Filters    • Remove parts of the content • Tokenizers
Solr Fields• Field Types can have: • Filters    • Remove parts of the content • Tokenizers    • Split content into chunks/...
Solr Fields
Solr Fields• The “String” Field Type
Solr Fields• The “String” Field Type• <fieldType  name="string"  class="solr.StrField" />
Solr Fields• The “String” Field Type• <fieldType  name="string"  class="solr.StrField" />• No Filter; No Tokenizer
Solr Fields• The “String” Field Type• <fieldType  name="string"  class="solr.StrField" />• No Filter; No Tokenizer • Field ...
<fieldtype name="html_text" class="solr.TextField" >  <analyzer type="index">     <tokenizer class="solr.HTMLStripWhitespa...
Sample Schema (cont.)<fieldtype name="sint" class="solr.SortableIntField" omitNorms="true" /><fieldtype name="string" class=...
Sample Schema (cont.)<fields>  <field name="id" type="sint" indexed="true" stored="true" multiValued="false" />  <field name=...
Put Data in Solr
Put Data in Solr• Remember, Solr communicates using XML  over HTTP
Put Data in Solr• Remember, Solr communicates using XML  over HTTP• No concept of updating a document -  delete, then add
Put Data in Solr• Remember, Solr communicates using XML  over HTTP• No concept of updating a document -  delete, then add•...
Put Data in Solr• Remember, Solr communicates using XML  over HTTP• No concept of updating a document -  delete, then add•...
Add XML<add> <doc>   <id>1</id>   <net>In the beginning God created the heavens and   the earth.</net> </doc></add>
PHP API• No XML!• $client = new SolrClient($options);  $doc = new SolrInputDocument();  $doc->addField(id, 1); //Must be I...
Querying Solr
Querying Solr• HTTP GET Request
Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god
Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god• | Path to Solr ||Core||Handler||Query |
Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god• | Path to Solr ||Core||Handler||Query |• ...
Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god• | Path to Solr ||Core||Handler||Query |• ...
Querying Solr
Querying Solr•   Queries the defaultSearchField by default
Querying Solr•   Queries the defaultSearchField by default    •   <defaultSearchField>all_index</defaultSearchField>
Querying Solr•   Queries the defaultSearchField by default    •   <defaultSearchField>all_index</defaultSearchField>•   Ca...
Querying Solr•   Queries the defaultSearchField by default    •   <defaultSearchField>all_index</defaultSearchField>•   Ca...
Querying Solr•   Queries the defaultSearchField by default    •   <defaultSearchField>all_index</defaultSearchField>•   Ca...
Querying Solr•   Queries the defaultSearchField by default    •   <defaultSearchField>all_index</defaultSearchField>•   Ca...
Search MultipleTranslations (Fields)
Search Multiple         Translations (Fields)•   Let’s add some fields: kjv and kjv_index
Search Multiple         Translations (Fields)•   Let’s add some fields: kjv and kjv_index•   Add some copy field directives:...
Search Multiple         Translations (Fields)•   Let’s add some fields: kjv and kjv_index•   Add some copy field directives:...
Search Multiple           Translations (Fields)•   Let’s add some fields: kjv and kjv_index•   Add some copy field directive...
Search Multiple           Translations (Fields)•   Let’s add some fields: kjv and kjv_index•   Add some copy field directive...
Search Multiple Translations
Search Multiple           Translations• + Quasi Synonym term/phrase injection
Search Multiple            Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads...
Search Multiple            Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads...
Search Multiple            Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads...
Search Multiple            Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads...
Search Multiple                      Translations•   Another way: Dismax•   Can score a document (verse) match based on sc...
Scoring
Scoring•   score(q,d) =    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))                                 ...
Scoring•   score(q,d) =    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))                                 ...
Scoring•   score(q,d) =    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))                                 ...
Scoring•   score(q,d) =    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))                                 ...
Scoring•   score(q,d) =    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))                                 ...
Scoring•   score(q,d) =    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))                                 ...
Scoring•   score(q,d) =    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))                                 ...
Scoring•   score(q,d) =    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))                                 ...
Search Multiple Translations
Search Multiple              Translations•   Another way: Dismax
Search Multiple              Translations•   Another way: Dismax•   Can score a document (verse) match based on scores/mat...
Search Multiple              Translations•   Another way: Dismax•   Can score a document (verse) match based on scores/mat...
Search Multiple                Translations•   Another way: Dismax•   Can score a document (verse) match based on scores/m...
Search Multiple                 Translations•   Another way: Dismax•   Can score a document (verse) match based on scores/...
Search Multiple                 Translations•   Another way: Dismax•   Can score a document (verse) match based on scores/...
Search Multiple                     Translations•   Another way: Dismax•   Can score a document (verse) match based on sco...
Search Multiple                      Translations•   Another way: Dismax•   Can score a document (verse) match based on sc...
Topic Tagging
Topic Tagging• Use a topically-tagged Bible/concordance to mark-  up each verse, or just key verses
Topic Tagging• Use a topically-tagged Bible/concordance to mark-  up each verse, or just key verses• Helpful for “theme” b...
Topic Tagging• Use a topically-tagged Bible/concordance to mark-  up each verse, or just key verses• Helpful for “theme” b...
Topic Tagging• Use a topically-tagged Bible/concordance to mark-  up each verse, or just key verses• Helpful for “theme” b...
Topic Tagging• Use a topically-tagged Bible/concordance to mark-  up each verse, or just key verses• Helpful for “theme” b...
Searching Strong’s
Searching Strong’s• Add a field for Strong’s: strongs_index
Searching Strong’s• Add a field for Strong’s: strongs_index•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756    2316...
Searching Strong’s• Add a field for Strong’s: strongs_index•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756    2316...
Searching Strong’s• Add a field for Strong’s: strongs_index•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756    2316...
Searching Strong’s• Add a field for Strong’s: strongs_index•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756    2316...
Searching Articles
Searching Articles• Similar approach to text-based queries
Searching Articles• Similar approach to text-based queries • Stem words
Searching Articles• Similar approach to text-based queries • Stem words • Use Synonyms
Searching Articles• Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words
Searching Articles• Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words• Without manual...
Searching Articles
Searching Articles• Article contains reference: “John 3”
Searching Articles• Article contains reference: “John 3”• User searches for “John 3:16” or “John 2-4”
Searching Articles• Article contains reference: “John 3”• User searches for “John 3:16” or “John 2-4”• Results: no meaning...
Searching Articles
Searching Articles• Solr-based Solutions:
Searching Articles• Solr-based Solutions: • Identify and index references and their    composite verses using a grammar.
Searching Articles• Solr-based Solutions: • Identify and index references and their    composite verses using a grammar. •...
Searching Articles• Solr-based Solutions: • Identify and index references and their    composite verses using a grammar. •...
Searching Articles• Solr-based Solutions: • Identify and index references and their    composite verses using a grammar. •...
Searching Articles
Searching Articles•   Relational database-based solution:
Searching Articles•   Relational database-based solution:    •   Assign an id to every verse
Searching Articles•   Relational database-based solution:    •   Assign an id to every verse    •   Store: id, articleId, ...
Searching Articles•   Relational database-based solution:    •   Assign an id to every verse    •   Store: id, articleId, ...
Searching Articles•   Relational database-based solution:    •   Assign an id to every verse    •   Store: id, articleId, ...
Searching Articles•   Relational database-based solution:    •   Assign an id to every verse    •   Store: id, articleId, ...
Searching Articles
Searching Articles• Relational database-based solution:
Searching Articles• Relational database-based solution: • Large amount of rows.
Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000...
Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000...
Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000...
Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000...
Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000...
Heterogeneous Indexes
Heterogeneous Indexes•   All content is not created equally.
Heterogeneous Indexes•   All content is not created equally.•   Content quality and its affect on the quality of    your r...
Heterogeneous Indexes•   All content is not created equally.•   Content quality and its affect on the quality of    your r...
Heterogeneous Indexes•   All content is not created equally.•   Content quality and its affect on the quality of    your r...
Heterogeneous Indexes•   All content is not created equally.•   Content quality and its affect on the quality of    your r...
Search in the Biblical Domain - BibleTech: 2011
Search in the Biblical Domain - BibleTech: 2011
Upcoming SlideShare
Loading in...5
×

Search in the Biblical Domain - BibleTech: 2011

1,098

Published on

Covers techniques for searching the Bible using multiple translations and searching extra-biblical content like commentaries and journals.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,098
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript of "Search in the Biblical Domain - BibleTech: 2011"

    1. 1. Search in the Biblical Domain Brian Seagraves (Bible.org)
    2. 2. What is “Search”?
    3. 3. What is “Search”?• Information/Document Retrieval
    4. 4. What is “Search”?• Information/Document Retrieval• Basic Definition:
    5. 5. What is “Search”?• Information/Document Retrieval• Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.
    6. 6. What is “Search”?• Information/Document Retrieval• Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.• Advanced Definition:
    7. 7. What is “Search”?• Information/Document Retrieval• Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.• Advanced Definition: • Finding relevant content for some query by understanding the contextual meaning of terms in the search index and query.
    8. 8. What is “Search”?• Information/Document Retrieval• Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.• Advanced Definition: • Finding relevant content for some query by understanding the contextual meaning of terms in the search index and query. • Semantic Search
    9. 9. Types and Sources of Content
    10. 10. Types and Sources of Content• The Bible and its verses
    11. 11. Types and Sources of Content• The Bible and its verses• Articles, Journals, and other extra-biblical content
    12. 12. Types and Sources of Content• The Bible and its verses• Articles, Journals, and other extra-biblical content• The web
    13. 13. Information Retrieval Engines
    14. 14. Information Retrieval Engines• Sphinx - http://sphinxsearch.com
    15. 15. Information Retrieval Engines• Sphinx - http://sphinxsearch.com• Lucene - http://lucene.apache.org/
    16. 16. Information Retrieval Engines• Sphinx - http://sphinxsearch.com• Lucene - http://lucene.apache.org/ • Solr - http://lucene.apache.org/solr/
    17. 17. Information Retrieval Engines• Sphinx - http://sphinxsearch.com• Lucene - http://lucene.apache.org/ • Solr - http://lucene.apache.org/solr/• MySQL Fulltext Search - kinda
    18. 18. Solr
    19. 19. Solr• Open Source
    20. 20. Solr• Open Source• Full-text search
    21. 21. Solr• Open Source• Full-text search• Hit Highlighting
    22. 22. Solr• Open Source• Full-text search• Hit Highlighting• Facets
    23. 23. Solr• Open Source• Full-text search• Hit Highlighting• Facets• Java
    24. 24. Solr• Open Source• Full-text search• Hit Highlighting• Facets• Java• REST-like HTTP/XML and JSON APIs
    25. 25. Solr Documents
    26. 26. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved
    27. 27. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved • Bible Verse
    28. 28. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article
    29. 29. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article • Commentary Chapter/Section
    30. 30. Solr Documents• A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article • Commentary Chapter/Section • Web Page
    31. 31. Solr Documents
    32. 32. Solr Documents• Documents have one or more Fields
    33. 33. Solr Documents• Documents have one or more Fields• Fields Have types
    34. 34. Solr Documents• Documents have one or more Fields• Fields Have types • Integer
    35. 35. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float
    36. 36. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float • String
    37. 37. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float • String • Text
    38. 38. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float • String • Text • Date
    39. 39. Solr Documents• Documents have one or more Fields• Fields Have types • Integer • Float • String • Text • Date • and More!
    40. 40. Solr Fields
    41. 41. Solr Fields• Field Types can have:
    42. 42. Solr Fields• Field Types can have: • Filters
    43. 43. Solr Fields• Field Types can have: • Filters • Remove parts of the content
    44. 44. Solr Fields• Field Types can have: • Filters • Remove parts of the content • Tokenizers
    45. 45. Solr Fields• Field Types can have: • Filters • Remove parts of the content • Tokenizers • Split content into chunks/tokens
    46. 46. Solr Fields
    47. 47. Solr Fields• The “String” Field Type
    48. 48. Solr Fields• The “String” Field Type• <fieldType name="string" class="solr.StrField" />
    49. 49. Solr Fields• The “String” Field Type• <fieldType name="string" class="solr.StrField" />• No Filter; No Tokenizer
    50. 50. Solr Fields• The “String” Field Type• <fieldType name="string" class="solr.StrField" />• No Filter; No Tokenizer • Field content won’t be split or changed
    51. 51. <fieldtype name="html_text" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" /> <filter class="solr.StopFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer></fieldtype>
    52. 52. Sample Schema (cont.)<fieldtype name="sint" class="solr.SortableIntField" omitNorms="true" /><fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    53. 53. Sample Schema (cont.)<fields> <field name="id" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="abbr" type="string" indexed="true" stored="true" multiValued="false" /> <field name="name" type="string" indexed="true" stored="true" multiValued="false" /> <field name="book" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="chapter" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="verse" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="ot_nt" type="string" indexed="true" stored="true" multiValued="false" /> <field name="net" type="text" indexed="false" stored="true" multiValued="false" /> <field name="all_index" type="html_text" indexed="true" stored="false" /></fields><copyField source="net" dest="all_index" /><uniqueKey>id</uniqueKey><defaultSearchField>all_index</defaultSearchField><solrQueryParser defaultOperator="OR" />
    54. 54. Put Data in Solr
    55. 55. Put Data in Solr• Remember, Solr communicates using XML over HTTP
    56. 56. Put Data in Solr• Remember, Solr communicates using XML over HTTP• No concept of updating a document - delete, then add
    57. 57. Put Data in Solr• Remember, Solr communicates using XML over HTTP• No concept of updating a document - delete, then add• To add, POST XML to update handler
    58. 58. Put Data in Solr• Remember, Solr communicates using XML over HTTP• No concept of updating a document - delete, then add• To add, POST XML to update handler • http://localhost:8080/solr/bible/update
    59. 59. Add XML<add> <doc> <id>1</id> <net>In the beginning God created the heavens and the earth.</net> </doc></add>
    60. 60. PHP API• No XML!• $client = new SolrClient($options); $doc = new SolrInputDocument(); $doc->addField(id, 1); //Must be Integer $doc->addField(net, ‘In the beginning God created the heavens and the earth.’); $client->addDocument($doc);
    61. 61. Querying Solr
    62. 62. Querying Solr• HTTP GET Request
    63. 63. Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god
    64. 64. Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god• | Path to Solr ||Core||Handler||Query |
    65. 65. Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god• | Path to Solr ||Core||Handler||Query |• Returns XML By Default
    66. 66. Querying Solr• HTTP GET Request• http://localhost:8080/solr/bible3/select?q=god• | Path to Solr ||Core||Handler||Query |• Returns XML By Default• Can return JSON and more
    67. 67. Querying Solr
    68. 68. Querying Solr• Queries the defaultSearchField by default
    69. 69. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>
    70. 70. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>• Can query other fields by using the syntax:field:value
    71. 71. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>• Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974
    72. 72. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>• Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974• Multiple queries / Booleans
    73. 73. Querying Solr• Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>• Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974• Multiple queries / Booleans • http://localhost:8080/solr/bible3/select?q=god AND book:40
    74. 74. Search MultipleTranslations (Fields)
    75. 75. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index
    76. 76. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index• Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />
    77. 77. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index• Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />• Query: “Shew Thyself”
    78. 78. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index• Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />• Query: “Shew Thyself” • 0 Results in the NET http://localhost:8080/solr/bible3/select?q=shew%20theyself
    79. 79. Search Multiple Translations (Fields)• Let’s add some fields: kjv and kjv_index• Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />• Query: “Shew Thyself” • 0 Results in the NET http://localhost:8080/solr/bible3/select?q=shew%20theyself • 360 Results in the Combined index/field http://localhost:8080/solr/bible4/select?q=shew%20theyself
    80. 80. Search Multiple Translations
    81. 81. Search Multiple Translations• + Quasi Synonym term/phrase injection
    82. 82. Search Multiple Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads to stronger possible matches
    83. 83. Search Multiple Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads to stronger possible matches• + Matches verses when the source translation isn’t known
    84. 84. Search Multiple Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads to stronger possible matches• + Matches verses when the source translation isn’t known• - No control over which translation gets more weight
    85. 85. Search Multiple Translations• + Quasi Synonym term/phrase injection• + Less variation across translations leads to stronger possible matches• + Matches verses when the source translation isn’t known• - No control over which translation gets more weight• - No control over scoring of matches
    86. 86. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.• net_index^6 kjv_index^.5• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^6%20kjv_index^.5&fl=score
    87. 87. Scoring
    88. 88. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q
    89. 89. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors
    90. 90. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better)
    91. 91. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better)
    92. 92. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better)
    93. 93. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35
    94. 94. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35 • http://localhost:8080/solr/bible3/select?q=wept
    95. 95. Scoring• score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q• Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35 • http://localhost:8080/solr/bible3/select?q=wept• http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/ Similarity.html
    96. 96. Search Multiple Translations
    97. 97. Search Multiple Translations• Another way: Dismax
    98. 98. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.
    99. 99. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1
    100. 100. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights
    101. 101. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.
    102. 102. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.• net_index^6 kjv_index^.5
    103. 103. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.• net_index^6 kjv_index^.5• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score
    104. 104. Search Multiple Translations• Another way: Dismax• Can score a document (verse) match based on scores/matches from multiple fields.• net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.• net_index^6 kjv_index^.5• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score• http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^6%20kjv_index^.5&fl=score
    105. 105. Topic Tagging
    106. 106. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses
    107. 107. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses• Helpful for “theme” based queries.
    108. 108. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses• Helpful for “theme” based queries. • “Social Justice” - no good matches
    109. 109. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses• Helpful for “theme” based queries. • “Social Justice” - no good matches • “Satan” - Many Names
    110. 110. Topic Tagging• Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses• Helpful for “theme” based queries. • “Social Justice” - no good matches • “Satan” - Many Names • Name Tagging in general can be very helpful
    111. 111. Searching Strong’s
    112. 112. Searching Strong’s• Add a field for Strong’s: strongs_index
    113. 113. Searching Strong’s• Add a field for Strong’s: strongs_index• 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198
    114. 114. Searching Strong’s• Add a field for Strong’s: strongs_index• 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198• Most of the benefits of text searching
    115. 115. Searching Strong’s• Add a field for Strong’s: strongs_index• 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198• Most of the benefits of text searching • “Word” frequency
    116. 116. Searching Strong’s• Add a field for Strong’s: strongs_index• 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198• Most of the benefits of text searching • “Word” frequency • Document vs. corpus frequency of search terms
    117. 117. Searching Articles
    118. 118. Searching Articles• Similar approach to text-based queries
    119. 119. Searching Articles• Similar approach to text-based queries • Stem words
    120. 120. Searching Articles• Similar approach to text-based queries • Stem words • Use Synonyms
    121. 121. Searching Articles• Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words
    122. 122. Searching Articles• Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words• Without manual tagging, there’s no automatic way to index/search by Bible Reference
    123. 123. Searching Articles
    124. 124. Searching Articles• Article contains reference: “John 3”
    125. 125. Searching Articles• Article contains reference: “John 3”• User searches for “John 3:16” or “John 2-4”
    126. 126. Searching Articles• Article contains reference: “John 3”• User searches for “John 3:16” or “John 2-4”• Results: no meaningful matches at best (unless the documents match the query “John”
    127. 127. Searching Articles
    128. 128. Searching Articles• Solr-based Solutions:
    129. 129. Searching Articles• Solr-based Solutions: • Identify and index references and their composite verses using a grammar.
    130. 130. Searching Articles• Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3
    131. 131. Searching Articles• Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3 • Store in a multivalued field - each reference is a “term”
    132. 132. Searching Articles• Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3 • Store in a multivalued field - each reference is a “term” • Must also parse and expand references in queries in order to match
    133. 133. Searching Articles
    134. 134. Searching Articles• Relational database-based solution:
    135. 135. Searching Articles• Relational database-based solution: • Assign an id to every verse
    136. 136. Searching Articles• Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId
    137. 137. Searching Articles• Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids.
    138. 138. Searching Articles• Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids. • SELECT COUNT(id) WHERE verseId IN (ID_LIST) GROUP BY articleId
    139. 139. Searching Articles• Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids. • SELECT COUNT(id) WHERE verseId IN (ID_LIST) GROUP BY articleId • Higher count -> Article is most likely to me more about that reference than other articles with a lower count
    140. 140. Searching Articles
    141. 141. Searching Articles• Relational database-based solution:
    142. 142. Searching Articles• Relational database-based solution: • Large amount of rows.
    143. 143. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences)
    144. 144. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count
    145. 145. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId.
    146. 146. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId. • Negligibly faster.
    147. 147. Searching Articles• Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId. • Negligibly faster. • Only approx. 3,000,000 rows
    148. 148. Heterogeneous Indexes
    149. 149. Heterogeneous Indexes• All content is not created equally.
    150. 150. Heterogeneous Indexes• All content is not created equally.• Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one
    151. 151. Heterogeneous Indexes• All content is not created equally.• Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal
    152. 152. Heterogeneous Indexes• All content is not created equally.• Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal• Apply a field or document boost to help normalize results
    153. 153. Heterogeneous Indexes• All content is not created equally.• Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal• Apply a field or document boost to help normalize results• Some content gets bumped up and some down

    ×