Make Your Data Searchable With Solr in 25 Minutes

Make Your Data
Searchable
With Solr in 25 Minutes
Kai Chan
BruinTech Tech-a-Thon, November 19, 2013

The Goal
data
data
find this

The Goal
• objectives
o find something in the (text) data
o get the results fast
o get the most relevant results first
o avoid getting the not-so-relevant results first
• (one) solution: Solr

What Solr is
• used by high-profile websites like Twitter
… and interesting projects like NewsScape
• open-source, full-text search platform
• uses Lucene for indexing and searching
• standalone process/program (typically)
• REST-like API over HTTP
• different output formats (XML, JSON, CSV)

How to Talk To Solr
• have front-end/browser make HTTP
requests
• language-specific clients
o .Net
o Java
o PHP
o Python
o Ruby
• integration with other applications
o Moodle
o Drupal
Plone

How Solr works
Solr
query
(i.e. search criteria)
result
(i.e. things being looked for)

How Solr works
Solr
query
result
Solr
index
index

How Solr works
Solr
Solr
query
result
Solr
data to be searched
index
index

How Solr works
Solr
Solr
query
query
result’
Solr
index
index
index’
additions
updates
deletions
result

How Data Are Organized
collection
document document document
field
field
field
field
field
field
field
field
field

collection
subject
date
from
subject
date
from
date
from
reply-to
text text
reply-to
text

collection
subject
date
from
title
SKU
price
last name
phone
text description
first name
address

Solr Field Definition
• field
o name
o type
o options
• field type
o text: "string", "text_general"
o numeric: "int", "long", "float", "double"
• options
o indexed: content can be searched
o stored: content can be returned at search-time
o multivalued: multiple values per field & document

Solr Dynamic Field
• define field by naming convention
• "amount_i": int, index, stored
• "tag_ss": string, indexed, stored, multivalued
name type indexed stored multiValued
*_i int true true false
*_l long true true false
*_f float true true false
*_d double true true false
*_s string true true false
*_ss string true true true
*_t text_general true true false
*_txt text_general true true true

Getting Data into Solr
• submit (post) files to Solr
o XML
o JSON
o CSV
• have Solr pull data from database or file
o RDBMS
o XML data locally (file) or remotely (HTTP)
o extract data (XPath)
o manipulate data (regex replace, strip HTML tags)

Searching Data in Solr
• send request to http://host:port/solr/search
• parameters
o q - main query
o fl - fields to return
o sort - sort criteria
o wt - response writer (e.g. xml, json)
o indent - set to true for pretty-printing

Query Syntax
• basic format: field name “:”
word/phrasetext:negotiation
text:"debt ceiling"

Query Syntax
• several clauses: separated by
spacetext:negotiation
subject:debt
• make the word/phrase required: “+”
prefix+text:negotiation
+subject:debt
• make the word/phrase prohibited: “-”
prefixtext:negotiation -
subject:debt

Additional Things Solr Can Do
• other types of queries
o range
o fuzzy
o wildcard
o regex
o proximity
o spatial
o join
• sorting
• faceted search
• … and more

Conclusion
• more about
Solr:http://lucene.apache.org/solr/
• Solr reference
guide:http://www.apache.org/dyn/closer.cgi/l
ucene/solr/ref-guide/
• my e-mail:kai@ssc.ucla.edu
• questions?

Make Your Data Searchable With Solr in 25 Minutes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Make Your Data Searchable With Solr in 25 Minutes

Similar to Make Your Data Searchable With Solr in 25 Minutes (20)

Recently uploaded

Recently uploaded (20)

Make Your Data Searchable With Solr in 25 Minutes