"Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012
1. Searching with Solr
AN INTRODUCTION
Tyler Harms
Developer
@harmstyler
tyler@blendinteractive.com
1
Saturday, November 10, 12
2. Why Implement Solr?
SEARCHING WITH SOLR
• Does your site need search?
• Is google enough?
• Do you need/want to control rankings?
• Just text, or Structured Data?
2
Saturday, November 10, 12
3. What is Solr?
SEARCHING WITH SOLR
Solr is a standalone enterprise
search server with a REST-like API.
You put documents in it [...] over
HTTP. You query it via HTTP GET
and receive [...] results.
3
Saturday, November 10, 12
10. Scaling Solr
SEARCHING WITH SOLR
• Master/Slave Architecture
• Write to master -> Read from slaves
• Multicore Setup
• Multiple Solr ‘cores’ running alongside each other within the same install
10
Saturday, November 10, 12
11. Solr’s Data Model
SEARCHING WITH
SUB HEADLINE SOLR
• Solr maintains a collection of documents
• A document is a collection of fields and values
• A field can occur multiple times in a doc
• Documents are immutable
• They can be deleted and replaced by new versions, however.
11
Saturday, November 10, 12
12. Querying
SEARCHING WITH
SUB HEADLINE SOLR
• http request
• http://localhost:8983/solr/select?q=blend&start=0&rows=10
12
Saturday, November 10, 12
13. Solr Query Syntax
SEARCHING WITH
SUB HEADLINE SOLR
• blend (value)
• company:blend (field:value)
• title:”Searching with Solr” AND text:apache
• id:[* TO *]
• *:* (all fields : all values)
13
Saturday, November 10, 12
14. Using Solr
SEARCHING WITH
SUB HEADLINE SOLR
• Getting Data into Solr
• Getting Data out of Solr
14
Saturday, November 10, 12
15. Getting Data into Solr
SEARCHING WITH
SUB HEADLINE SOLR
• POST it
<add>
<doc>
<field name="abstract">Lorem ipsum</field>
<field name="company">Blend Interactive</field>
<field name="text">Lorem Ipsum</field>
<field name="title">Some Title</field>
</doc>
[<doc> ... </doc>[<doc> ... </doc>]]
</add>
15
Saturday, November 10, 12
16. Getting Data into Solr
SEARCHING WITH
SUB HEADLINE SOLR
• POST it
<add>
<doc>
<field name="abstract">Lorem ipsum</field>
<field name="company">Blend Interactive</field>
<field name="text">Lorem Ipsum</field>
<field name="title">Some Title</field>
</doc>
[<doc> ... </doc>[<doc> ... </doc>]]
</add>
16
Saturday, November 10, 12
17. Getting Data into Solr
SEARCHING WITH
SUB HEADLINE SOLR
• POST it
<add>
<doc>
<field name="abstract">Lorem ipsum</field>
<field name="company">Blend Interactive</field>
<field name="text">Lorem Ipsum</field>
<field name="title">Some Title</field>
</doc>
[<doc> ... </doc>[<doc> ... </doc>]]
</add>
17
Saturday, November 10, 12
18. Commiting
SEARCHING WITH
SUB HEADLINE SOLR
• Nothing shows up in the index until you commit
• You can just POST <commit/> to:
• http://<host>:<port>/solr/update
18
Saturday, November 10, 12
19. Getting Data out of Solr
SEARCHING WITH
SUB HEADLINE SOLR
• http://localhost:8983/solr/select/?q=solr
19
Saturday, November 10, 12
20. <response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">19</int>
<lst name="params">
<str name="q">solr</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="abstract">
A brief introduction to using Apache Solr for implementing search for your website.
</str>
<str name="django_ct">codecamp.session</str>
<str name="django_id">19</str>
<str name="id">codecamp.session.19</str>
<str name="text">
Searching with Solr: An Introduction A brief introduction to using Apache Solr for
implementing search for your website.
</str>
<str name="title">Searching with Solr: An Introduction</str>
</doc>
</result>
</response>
20
Saturday, November 10, 12
21. <response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">19</int>
<lst name="params">
<str name="q">solr</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="abstract">
A brief introduction to using Apache Solr for implementing search for your website.
</str>
<str name="django_ct">codecamp.session</str>
<str name="django_id">19</str>
<str name="id">codecamp.session.19</str>
<str name="text">
Searching with Solr: An Introduction A brief introduction to using Apache Solr for
implementing search for your website.
</str>
<str name="title">Searching with Solr: An Introduction</str>
</doc>
</result>
</response>
21
Saturday, November 10, 12
22. <response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">19</int>
<lst name="params">
<str name="q">solr</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="abstract">
A brief introduction to using Apache Solr for implementing search for your website.
</str>
<str name="django_ct">codecamp.session</str>
<str name="django_id">19</str>
<str name="id">codecamp.session.19</str>
<str name="text">
Searching with Solr: An Introduction A brief introduction to using Apache Solr for
implementing search for your website.
</str>
<str name="title">Searching with Solr: An Introduction</str>
</doc>
</result>
</response>
22
Saturday, November 10, 12
23. Getting Data out of Solr: JSON
SEARCHING WITH
SUB HEADLINE SOLR
• http://localhost:8983/solr/select/?q=solr&wt=json
23
Saturday, November 10, 12
24. {
"responseHeader": {
"status":0,
"QTime":0,
"params": {
"wt":"json",
"q":"solr"
}
},
"response": {
"numFound":1,
"start":0,
"docs":[{
"django_id":"19",
"title":"Searching with Solr: An Introduction",
"text":"Searching with Solr: An IntroductionnA brief introduction to using Apache Solr for
implementing search for your website.",
"abstract":"A brief introduction to using Apache Solr for implementing search for your
website.",
"django_ct":"codecamp.session","id":"codecamp.session.19"
}]
}
}
24
Saturday, November 10, 12
25. Deleting Data from Solr
SEARCHING WITH
SUB HEADLINE SOLR
• POST it
<delete><id>codecamp.session.19</id></delete>
<delete><query>company:blend</query></delete>
25
Saturday, November 10, 12
26. The Solr Schema
SEARCHING WITH SOLR
• schema.xml
• Defines ‘types’ used in the webapp
• Defines the fields
• Defines ‘copyfields’
• Read the schema inside the example project for more
26
Saturday, November 10, 12
27. The Solr Schema
SEARCHING WITH SOLR
• Types
• Define how a field and query should be processed
• Word Stemming
• Case Folding
• How would you handle a search for ‘C.I.A.’?
• Dates, ints, floats, etc.. are defined here as well
• 2 Modes
• Index Time
• Query Time
27
Saturday, November 10, 12
31. Fields
SEARCHING WITH SOLR
• The elements of a document
• Both Predefined and Dynamic
• Fields may occur multiple times
• May be indexed and/or stored
31
Saturday, November 10, 12
38. <copyField source="bio" dest="df_text" />
<copyField source="year" dest="century" maxChars="2"/>
2000 would be stored as 20
Useful for custom faceting
38
Saturday, November 10, 12
39. The Solr Config File
SEARCHING WITH
SUB HEADLINE SOLR
• solrconfig.xml
• Defines request handlers, defaults, & caches
• Read the solrconfig.xml inside the example project for more
39
Saturday, November 10, 12
40. Other Solr Tools
SEARCHING WITH
SUB HEADLINE SOLR
• Debug Query
• Boost Functions
• Search Faceting
• Search Filters
• Search Highlighting
• Solr Admin
40
Saturday, November 10, 12
41. Debug Query Option
SEARCHING WITH
SUB HEADLINE SOLR
• Add &debugQuery=on to request parameters
• Returns a parsed form of the query
41
Saturday, November 10, 12
44. Boost Function
SEARCHING WITH
SUB HEADLINE SOLR
• Allows you to influence results at query time
• Really useful for tuning scoring
• You can also boost at index time
44
Saturday, November 10, 12
45. Boost Function
SEARCHING WITH
SUB HEADLINE SOLR
• Allows you to influence results at query time
• Really useful for tuning scoring
• You can also boost at index time
q=blend&qf=text^2 company
45
Saturday, November 10, 12
46. Boost Function
SEARCHING WITH
SUB HEADLINE SOLR
• Allows you to influence results at query time
More information available -
• Really useful for tuning scoring
http://wiki.apache.org/solr/
SolrRelevancyFAQ
Can use both dismax and
• You can also boost at index time
standard query handlers, I use
dismax
q=blend&qf=text^2 company
46
Saturday, November 10, 12
47. Boost Function
SEARCHING WITH
SUB HEADLINE SOLR
• Allows you to influence results at query time
More information available -
• Really useful for tuning scoring
http://wiki.apache.org/solr/
SolrRelevancyFAQ
Can use both dismax and
• You can also boost at index time
standard query handlers, I use
dismax
&bq=text:blend^2
47
Saturday, November 10, 12
48. Solr Faceting
SEARCHING WITH
SUB HEADLINE SOLR
• What is a facet?
• “Interaction style where users filter a set of items by
progressively selecting from only valid values of a faceted
classification system” - Keith Instone, SOASIS&T, July 8, 2004
• What does it look like?
• Make sure to use an untokenized field (e.g. string)
• “San Jose” != “san”+“jose”
48
Saturday, November 10, 12
49. q=*:*
facet=on
facet.field=company
49
Saturday, November 10, 12
50. Solr Filter Query
SEARCHING WITH
SUB HEADLINE SOLR
• Used to narrow your search query
• Restrict the super set of documents that can be returned
• ‘fq’ parameter (short for Filter Query)
50
Saturday, November 10, 12
51. Solr Filter Query
SEARCHING WITH
SUB HEADLINE SOLR
• Used to narrow your search query
• Restrict the super set of documents that can be returned
• ‘fq’ parameter (short for Filter Query)
q=*:*
fq=company:blend
51
Saturday, November 10, 12
52. Search Highlighting
SEARCHING WITH
SUB HEADLINE SOLR
• Allow Solr to generate your highlight
52
Saturday, November 10, 12
53. Search Highlighting
SEARCHING WITH
SUB HEADLINE SOLR
• Allow Solr to generate your highlight
53
Saturday, November 10, 12
54. hl=true
hl.simple.pre=<b>
hl.simple.post=</b>
hl.fragsize=200
hl.requireFieldMatch=false
hl.fl=text bio title
hl.snippets=1
54
Saturday, November 10, 12
55. Solr Admin
SEARCHING WITH
SUB HEADLINE SOLR
• http://localhost:8983/solr/admin/
• Built in app for testing all search options
• Field Analysis
• Schema Browser
• Full Query Interface
• Solr Statistics
• Solr Information
• Many More Options
55
Saturday, November 10, 12
56. Solr/Browse
SEARCHING WITH
SUB HEADLINE SOLR
• Test your search configuration using the /browse
requestHandler
56
Saturday, November 10, 12
57. Resources
SEARCHING WITH
SUB HEADLINE SOLR
• Apache Solr Website
• http://lucene.apache.org/solr/
• Wiki, mailing list, bugs/features
• Books
57
Saturday, November 10, 12