SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
"Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012
"Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012
1.
Searching with Solr
AN INTRODUCTION
Tyler Harms
Developer
@harmstyler
tyler@blendinteractive.com
1
Saturday, November 10, 12
2.
Why Implement Solr?
SEARCHING WITH SOLR
• Does your site need search?
• Is google enough?
• Do you need/want to control rankings?
• Just text, or Structured Data?
2
Saturday, November 10, 12
3.
What is Solr?
SEARCHING WITH SOLR
Solr is a standalone enterprise
search server with a REST-like API.
You put documents in it [...] over
HTTP. You query it via HTTP GET
and receive [...] results.
3
Saturday, November 10, 12
5.
Solr Versions
SEARCHING WITH SOLR
• Current Version(s)
• Solr 3.6.1
• Solr 4
• Released Versions are always stable
5
Saturday, November 10, 12
6.
$ wget http://(...)/3.6.1/apache-solr-3.6.1.tgz
$ tar -xzf apache-solr-3.6.1.tgz
$ cd apache-solr-3.6.1/example/
$ java -jar start.jar
(a lot of java log...)
6
Saturday, November 10, 12
7.
Search Alternatives
SEARCHING WITH SOLR
• Google
• Lucene
• elasticsearch
• Whoosh
• Xapien
• Many Others
7
Saturday, November 10, 12
8.
NOT a Database Replacement
SEARCHING WITH SOLR
• Solr is designed to live alongside your website as a separate
web app
8
Saturday, November 10, 12
10.
Scaling Solr
SEARCHING WITH SOLR
• Master/Slave Architecture
• Write to master -> Read from slaves
• Multicore Setup
• Multiple Solr ‘cores’ running alongside each other within the same install
10
Saturday, November 10, 12
11.
Solr’s Data Model
SEARCHING WITH
SUB HEADLINE SOLR
• Solr maintains a collection of documents
• A document is a collection of fields and values
• A field can occur multiple times in a doc
• Documents are immutable
• They can be deleted and replaced by new versions, however.
11
Saturday, November 10, 12
12.
Querying
SEARCHING WITH
SUB HEADLINE SOLR
• http request
• http://localhost:8983/solr/select?q=blend&start=0&rows=10
12
Saturday, November 10, 12
13.
Solr Query Syntax
SEARCHING WITH
SUB HEADLINE SOLR
• blend (value)
• company:blend (field:value)
• title:”Searching with Solr” AND text:apache
• id:[* TO *]
• *:* (all fields : all values)
13
Saturday, November 10, 12
14.
Using Solr
SEARCHING WITH
SUB HEADLINE SOLR
• Getting Data into Solr
• Getting Data out of Solr
14
Saturday, November 10, 12
15.
Getting Data into Solr
SEARCHING WITH
SUB HEADLINE SOLR
• POST it
<add>
<doc>
<field name="abstract">Lorem ipsum</field>
<field name="company">Blend Interactive</field>
<field name="text">Lorem Ipsum</field>
<field name="title">Some Title</field>
</doc>
[<doc> ... </doc>[<doc> ... </doc>]]
</add>
15
Saturday, November 10, 12
16.
Getting Data into Solr
SEARCHING WITH
SUB HEADLINE SOLR
• POST it
<add>
<doc>
<field name="abstract">Lorem ipsum</field>
<field name="company">Blend Interactive</field>
<field name="text">Lorem Ipsum</field>
<field name="title">Some Title</field>
</doc>
[<doc> ... </doc>[<doc> ... </doc>]]
</add>
16
Saturday, November 10, 12
17.
Getting Data into Solr
SEARCHING WITH
SUB HEADLINE SOLR
• POST it
<add>
<doc>
<field name="abstract">Lorem ipsum</field>
<field name="company">Blend Interactive</field>
<field name="text">Lorem Ipsum</field>
<field name="title">Some Title</field>
</doc>
[<doc> ... </doc>[<doc> ... </doc>]]
</add>
17
Saturday, November 10, 12
18.
Commiting
SEARCHING WITH
SUB HEADLINE SOLR
• Nothing shows up in the index until you commit
• You can just POST <commit/> to:
• http://<host>:<port>/solr/update
18
Saturday, November 10, 12
19.
Getting Data out of Solr
SEARCHING WITH
SUB HEADLINE SOLR
• http://localhost:8983/solr/select/?q=solr
19
Saturday, November 10, 12
20.
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">19</int>
<lst name="params">
<str name="q">solr</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="abstract">
A brief introduction to using Apache Solr for implementing search for your website.
</str>
<str name="django_ct">codecamp.session</str>
<str name="django_id">19</str>
<str name="id">codecamp.session.19</str>
<str name="text">
Searching with Solr: An Introduction A brief introduction to using Apache Solr for
implementing search for your website.
</str>
<str name="title">Searching with Solr: An Introduction</str>
</doc>
</result>
</response>
20
Saturday, November 10, 12
21.
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">19</int>
<lst name="params">
<str name="q">solr</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="abstract">
A brief introduction to using Apache Solr for implementing search for your website.
</str>
<str name="django_ct">codecamp.session</str>
<str name="django_id">19</str>
<str name="id">codecamp.session.19</str>
<str name="text">
Searching with Solr: An Introduction A brief introduction to using Apache Solr for
implementing search for your website.
</str>
<str name="title">Searching with Solr: An Introduction</str>
</doc>
</result>
</response>
21
Saturday, November 10, 12
22.
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">19</int>
<lst name="params">
<str name="q">solr</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="abstract">
A brief introduction to using Apache Solr for implementing search for your website.
</str>
<str name="django_ct">codecamp.session</str>
<str name="django_id">19</str>
<str name="id">codecamp.session.19</str>
<str name="text">
Searching with Solr: An Introduction A brief introduction to using Apache Solr for
implementing search for your website.
</str>
<str name="title">Searching with Solr: An Introduction</str>
</doc>
</result>
</response>
22
Saturday, November 10, 12
23.
Getting Data out of Solr: JSON
SEARCHING WITH
SUB HEADLINE SOLR
• http://localhost:8983/solr/select/?q=solr&wt=json
23
Saturday, November 10, 12
24.
{
"responseHeader": {
"status":0,
"QTime":0,
"params": {
"wt":"json",
"q":"solr"
}
},
"response": {
"numFound":1,
"start":0,
"docs":[{
"django_id":"19",
"title":"Searching with Solr: An Introduction",
"text":"Searching with Solr: An IntroductionnA brief introduction to using Apache Solr for
implementing search for your website.",
"abstract":"A brief introduction to using Apache Solr for implementing search for your
website.",
"django_ct":"codecamp.session","id":"codecamp.session.19"
}]
}
}
24
Saturday, November 10, 12
25.
Deleting Data from Solr
SEARCHING WITH
SUB HEADLINE SOLR
• POST it
<delete><id>codecamp.session.19</id></delete>
<delete><query>company:blend</query></delete>
25
Saturday, November 10, 12
26.
The Solr Schema
SEARCHING WITH SOLR
• schema.xml
• Defines ‘types’ used in the webapp
• Defines the fields
• Defines ‘copyfields’
• Read the schema inside the example project for more
26
Saturday, November 10, 12
27.
The Solr Schema
SEARCHING WITH SOLR
• Types
• Define how a field and query should be processed
• Word Stemming
• Case Folding
• How would you handle a search for ‘C.I.A.’?
• Dates, ints, floats, etc.. are defined here as well
• 2 Modes
• Index Time
• Query Time
27
Saturday, November 10, 12
31.
Fields
SEARCHING WITH SOLR
• The elements of a document
• Both Predefined and Dynamic
• Fields may occur multiple times
• May be indexed and/or stored
31
Saturday, November 10, 12
38.
<copyField source="bio" dest="df_text" />
<copyField source="year" dest="century" maxChars="2"/>
2000 would be stored as 20
Useful for custom faceting
38
Saturday, November 10, 12
39.
The Solr Config File
SEARCHING WITH
SUB HEADLINE SOLR
• solrconfig.xml
• Defines request handlers, defaults, & caches
• Read the solrconfig.xml inside the example project for more
39
Saturday, November 10, 12
40.
Other Solr Tools
SEARCHING WITH
SUB HEADLINE SOLR
• Debug Query
• Boost Functions
• Search Faceting
• Search Filters
• Search Highlighting
• Solr Admin
40
Saturday, November 10, 12
41.
Debug Query Option
SEARCHING WITH
SUB HEADLINE SOLR
• Add &debugQuery=on to request parameters
• Returns a parsed form of the query
41
Saturday, November 10, 12
44.
Boost Function
SEARCHING WITH
SUB HEADLINE SOLR
• Allows you to influence results at query time
• Really useful for tuning scoring
• You can also boost at index time
44
Saturday, November 10, 12
45.
Boost Function
SEARCHING WITH
SUB HEADLINE SOLR
• Allows you to influence results at query time
• Really useful for tuning scoring
• You can also boost at index time
q=blend&qf=text^2 company
45
Saturday, November 10, 12
46.
Boost Function
SEARCHING WITH
SUB HEADLINE SOLR
• Allows you to influence results at query time
More information available -
• Really useful for tuning scoring
http://wiki.apache.org/solr/
SolrRelevancyFAQ
Can use both dismax and
• You can also boost at index time
standard query handlers, I use
dismax
q=blend&qf=text^2 company
46
Saturday, November 10, 12
47.
Boost Function
SEARCHING WITH
SUB HEADLINE SOLR
• Allows you to influence results at query time
More information available -
• Really useful for tuning scoring
http://wiki.apache.org/solr/
SolrRelevancyFAQ
Can use both dismax and
• You can also boost at index time
standard query handlers, I use
dismax
&bq=text:blend^2
47
Saturday, November 10, 12
48.
Solr Faceting
SEARCHING WITH
SUB HEADLINE SOLR
• What is a facet?
• “Interaction style where users filter a set of items by
progressively selecting from only valid values of a faceted
classification system” - Keith Instone, SOASIS&T, July 8, 2004
• What does it look like?
• Make sure to use an untokenized field (e.g. string)
• “San Jose” != “san”+“jose”
48
Saturday, November 10, 12
49.
q=*:*
facet=on
facet.field=company
49
Saturday, November 10, 12
50.
Solr Filter Query
SEARCHING WITH
SUB HEADLINE SOLR
• Used to narrow your search query
• Restrict the super set of documents that can be returned
• ‘fq’ parameter (short for Filter Query)
50
Saturday, November 10, 12
51.
Solr Filter Query
SEARCHING WITH
SUB HEADLINE SOLR
• Used to narrow your search query
• Restrict the super set of documents that can be returned
• ‘fq’ parameter (short for Filter Query)
q=*:*
fq=company:blend
51
Saturday, November 10, 12
52.
Search Highlighting
SEARCHING WITH
SUB HEADLINE SOLR
• Allow Solr to generate your highlight
52
Saturday, November 10, 12
53.
Search Highlighting
SEARCHING WITH
SUB HEADLINE SOLR
• Allow Solr to generate your highlight
53
Saturday, November 10, 12
54.
hl=true
hl.simple.pre=<b>
hl.simple.post=</b>
hl.fragsize=200
hl.requireFieldMatch=false
hl.fl=text bio title
hl.snippets=1
54
Saturday, November 10, 12
55.
Solr Admin
SEARCHING WITH
SUB HEADLINE SOLR
• http://localhost:8983/solr/admin/
• Built in app for testing all search options
• Field Analysis
• Schema Browser
• Full Query Interface
• Solr Statistics
• Solr Information
• Many More Options
55
Saturday, November 10, 12
56.
Solr/Browse
SEARCHING WITH
SUB HEADLINE SOLR
• Test your search configuration using the /browse
requestHandler
56
Saturday, November 10, 12
57.
Resources
SEARCHING WITH
SUB HEADLINE SOLR
• Apache Solr Website
• http://lucene.apache.org/solr/
• Wiki, mailing list, bugs/features
• Books
57
Saturday, November 10, 12