SlideShare a Scribd company logo
Search Engine-Building
with Lucene and Solr
Kai Chan
SoCal Code Camp, July 2013
How to Search - One Approach
for each document d {
if (query is a substring of d's content) {
add d to the list of results
}
}
sort the result (or not)
How to Search - Problems
● slow
○ reads the whole dataset for each search
● not scalable
○ if you dataset grows by 10x,
your search slows down by 10x
● how to show the most relevant documents
first?
○ list of results can be quite long
○ users have limited time and patience
Inverted Index - Introduction
● like the "index" at the end of books
● a map of one of the following types
○ term → document list
○ term → <document, position> list
documents:
T[0] = "it is what it is"
T[1] = "what is it"
T[2] = "it is a banana"
inverted index (without positions):
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
inverted index (with positions):
"a": {(2, 2)}
"banana": {(2, 3)}
"is": {(0, 1), (0, 4), (1, 1), (2, 1)}
"it": {(0, 0), (0, 3), (1, 2), (2, 0)}
"what": {(0, 2), (1, 0)}
Credit: Wikipedia (http://en.wikipedia.org/wiki/Inverted_index)
Inverted Index - Speed
● term list
○ typically very small
○ grows slowly
● term lookup
○ O(1) to O(log(number of terms))
● for a particular term
○ document lists: very small
○ document + position lists: still small
● few terms per query
Inverted Index - Relevance
● information in the index enables:
○ determination (scoring) of relevance of each
document to the query
○ comparison of relevance among documents
○ sorting by (decreasing) relevance
■ i.e. the most relevant document first
Lucene v.s. Solr - Lucene
● full-text search library
● creates, updates and read from the index
● takes queries and produces search results
● your application creates objects and calls
methods in the Lucene API
● provides building blocks for custom features
Lucene v.s. Solr - Solr
● full-text search server
● uses Lucene for indexing and search
● REST-like API over HTTP
● different output formats (e.g. XML, JSON)
● provides some features not built into Lucene
machine running Java VM
your application
machine running Java VM
servlet container (e.g. Tomcat, Jetty)
Solr
Solr code
Lucene code libraries
index
Lucene
Lucene code
index
libraries
client
HTTP
Lucene:
Solr:
Workflow
Setup
Indexing
Search
Workflow
Setup
Indexing
Search
Workflow - Setup
● servlet configuration
○ e.g. port number, max POST size
○ you can usually use the default settings
● Solr configuration
○ e.g. data directory, deduplication, language
identification, highlighting
○ you can usually use the default settings
● schema definition
○ defines fields in your documents
○ you can use the default settings if you name your
fields in a certain way
How Data Are Organized
collection
document document document
field
field
field
field
field
field
field
field
field
field
content (e.g. "please read" or 30)
name (e.g. "title" or "price")
type
options
index
document document document
subject
date
from
subject
date
from
date
from
text text
reply-to
text
reply-to
index
document document document
subject
date
from
title
SKU
price
last name
phone
text description
first name
address
Solr Field Definition
● field
○ name (e.g. "subject")
○ type (e.g. "text_general")
○ options (e.g. indexed="true" stored="true")
● field type
○ text: "string", "text_general"
○ numeric: "int", "long", "float", "double"
● options
○ indexed: content can be searched
○ stored: content can be returned at search-time
○ multivalued: multiple values per field & document
Solr Dynamic Field
● define field by naming convention
● "amount_i": int, index, stored
● "tag_ss": string, indexed, stored, multivalued
name type indexed stored multiValued
*_i int true true false
*_l long true true false
*_f float true true false
*_d double true true false
*_s string true true false
*_ss string true true true
*_t text_general true true false
*_txt text_general true true true
Solr Copy Field
● copy one or more fields into another field
● can be used to define a catch-all field
○ source: "title", "author", "description"
○ destination: "text"
○ searching the "text" field has the effect of searching
all the other three fields
Workflow
Setup
Indexing
Search
Indexing - UpdateRequestHandler
● upload content or file to http://host:
port/solr/update
● formats: XML, JSON, CSV
XML:
<add>
<doc>
<field name="id">apple</field>
<field name="compName">Apple</field>
<field name="address">1 Infinite Way, Cupertino CA</field>
</doc>
<doc>
<field name="id">asus</field>
<field name="compName">ASUS Computer</field>
<field name="address">800 Corporate Way Fremont, CA 94539</field>
</doc>
</add>
CSV:
id,compName_s,address_s
apple,Apple,"1 Infinite Way, Cupertino CA"
asus,Asus Computer,"800 Corporate Way Fremont, CA 94539"
JSON:
[
{"id":"apple","compName_s":"Apple","address_s":"1 Infinite Way,
Cupertino CA"}
{"id":"asus","compName_s":"Asus Computer","address_s":"800 Corporate
Way Fremont, CA 94539"}
]
Indexing - DataImportHandler
● has its own config file (data-config.xml)
● import data from various sources
○ RDBMS (JDBC)
○ e-mail (IMAP)
○ XML data locally (file) or remotely (HTTP)
● transformers
○ extract data (RegEx, XPath)
○ manipulate data (strip HTML tags)
Workflow
Setup
Indexing
Search
Searching - Basics
● send request to http://host:port/solr/search
● parameters
○ q - main query
○ fq - filter query
○ defType - query parser (e.g. lucene, edismax)
○ fl - fields to return
○ sort - sort criteria
○ wt - response writer (e.g. xml, json)
○ indent - set to true for pretty-printing
http://localhost:8983/solr/select?q=title:tablet&
fl=title,price,inStock&sort=price&wt=json
search handler's URL main query
response writersort criteriafields to return
Searching - Query Syntax - Field
● search a specific field
○ field_name:value
● if field omitted, Solr uses default field:
○ df parameter in URL
○ defaultSearchField setting in schema.xml
○ "text"
Searching - Query Syntax - Term
● a term by itself: matches documents that
contain that term
○ e.g. tablet
Searching - Query Syntax - Boolean
● boolean operators are supported
○ AND &&
○ OR ||
○ NOT !
● e.g. a AND b
○ all of a, b must occur
● e.g. a OR b
○ at least one of a, b must occur
● e.g. a AND NOT b
○ a must occur and b must not occur
Searching - Query Syntax - Boolean
● Lucene/Solr's boolean operators are not true
boolean operators
● e.g. a OR b OR c does not behave like
(a OR b) OR c
○ instead, a OR b OR c means at least one of a, b, c
must occur
● parentheses are supported
Searching - Query Syntax - Boolean
● "+" prefix means "must"
● "-" prefix means "must not"
● no prefix means "at least one must"
(by default)
○ e.g. a b c
■ at least one of a, b, c must occur
● operators can mix
○ e.g. +a b c d -e
■ a must occur
■ at least one of b, c, d must occur
■ e must not occur
Searching - Query Syntax - Phrase
● phrases are enclosed by double-quotes
● e.g. +"the phrase"
○ the phrase must occur
● e.g. -"the phrase"
○ the phrase must not occur
Searching - Query Syntax - Boost
● manually assign different weights to clauses
● gives more weight to a field
○ e.g. title:a^10 body:a
● gives more weight to a word
○ e.g. title:a title:b^10
● gives phrases more weight than words
○ e.g. title:(+a +b) title:"a b"^10
Searching - Query Syntax - Range
● matches field values within a range
○ inclusive range - denoted by square brackets
○ exclusive range - denoted by curly brackets
● e.g. age:[10 TO 20]
○ matches the field "age" with the value in 10..20
● string or numeric comparison, depending on
the field's type
Searching - Query Syntax - EDisMax
● suitable for user-generated queries
○ supports a subset of Lucene QP's syntax
○ does not complain about the syntax
○ searches for individual words across several fields
("disjunction")
○ uses max score of a word in all fields for scoring
("max")
● configurable (in solrconfig.xml)
○ what fields to search the words in
○ weighting of these fields
Sorting
● default: sorting by decreasing score
● sorting by field: using the sort parameter
○ specify field name and order
■ price asc - sort by "price" field, ascending
■ price desc - sort by "price" field, descending
○ multiple fields and orders by comma
■ starRating desc, price asc - sort by
"starRating" field, descending, and then by
"price" field, ascending
○ cannot use multivalued fields
○ overrides sorting by decreasing relevance
Faceted Search
● facet values: (distinct) values (generally non-
overlapping) ranges of a field
● displaying facets
○ show possible values
○ let users narrow down their searches easily
facet
facet values (5 of them)
Faceted Search
● set facet parameter to true - enables
faceting
● other parameters
○ facet.field - use the field's values as facets
■ return <value, count> pairs
○ facet.query - use the given queries as facets
■ return <query, count> pairs
○ facet.sort - set the ordering of the facets;
■ can be "count" or "index"
○ facet.offset and face.limit - used for
pagination of facets
Resources - Books
● Lucene in Action
○ written by 3 committer and PMC members
○ somewhat outdated (2010; covers Lucene 3.0)
○ http://www.manning.com/hatcher3/
● Solr in Action
○ early access; coming out later this year
○ http://www.manning.com/grainger/
● Apache Solr 4 Cookbook
○ common problems and useful tips
○ http://www.packtpub.com/apache-solr-4-
cookbook/book
Resources - Books
● Introduction to Information Retrieval
○ not specific to Lucene/Solr, but about IR concepts
○ free e-book
○ http://nlp.stanford.edu/IR-book/
● Managing Gigabytes
○ indexing, compression and other topics
○ accompanied by MG4J - a full-text search software
○ http://mg4j.di.unimi.it/
Resources - Web
● official websites
○ Lucene Core - http://lucene.apache.org/core/
○ Solr - http://lucene.apache.org/solr/
● mailing lists
● Wiki sites
○ Lucene Core - http://wiki.apache.org/lucene-java/
○ Solr - http://wiki.apache.org/solr/
● reference guides
○ API Documentation for Lucene and Solr
○ Apache Solr Reference Guide (LucidWorks) - http:
//lucene.apache.org/solr/tutorial.html
Getting Started
● download Solr
○ requires Java 6 or newer to run
● Solr comes bundled and configured with
Jetty
○ <Solr directory>/example/start.jar
● "exampledocs" directory contains sample
documents
○ <Solr directory>/example/exampledocs/post.jar
● use the Solr admin interface
○ http://localhost:8983/solr/

More Related Content

What's hot

Apache Solr
Apache SolrApache Solr
Apache Solr
Semih Hakkıoğlu
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 
Apache avro and overview hadoop tools
Apache avro and overview hadoop toolsApache avro and overview hadoop tools
Apache avro and overview hadoop tools
alireza alikhani
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Lucidworks
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Bertrand Delacretaz
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Ecommerce Solution Provider SysIQ
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
Yonik Seeley
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
Saumitra Srivastav
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
Alexandre Rafalovitch
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
Ramamohan Chokkam
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
Alexandre Rafalovitch
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
th0masr
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Lucidworks
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Java and XML
Java and XMLJava and XML
Java and XML
Raji Ghawi
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
Ramamohan Chokkam
 

What's hot (20)

Apache Solr
Apache SolrApache Solr
Apache Solr
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Apache avro and overview hadoop tools
Apache avro and overview hadoop toolsApache avro and overview hadoop tools
Apache avro and overview hadoop tools
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Java and XML
Java and XMLJava and XML
Java and XML
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
 

Viewers also liked

Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
Biogeeks
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
'Moinuddin Ahmed
 
Solr installation
Solr installationSolr installation
Solr installation
ZHAO Sam
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
Grant Ingersoll
 
Webinar: MongoDB and Polyglot Persistence Architecture
Webinar: MongoDB and Polyglot Persistence ArchitectureWebinar: MongoDB and Polyglot Persistence Architecture
Webinar: MongoDB and Polyglot Persistence Architecture
MongoDB
 
Getting to know alfresco 4
Getting to know alfresco 4Getting to know alfresco 4
Getting to know alfresco 4
Paul Hampton
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 

Viewers also liked (7)

Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
Solr installation
Solr installationSolr installation
Solr installation
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
Webinar: MongoDB and Polyglot Persistence Architecture
Webinar: MongoDB and Polyglot Persistence ArchitectureWebinar: MongoDB and Polyglot Persistence Architecture
Webinar: MongoDB and Polyglot Persistence Architecture
 
Getting to know alfresco 4
Getting to know alfresco 4Getting to know alfresco 4
Getting to know alfresco 4
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 

Similar to Search Engine-Building with Lucene and Solr

Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
Emanuel Calvo
 
Kibana: Real-World Examples
Kibana: Real-World ExamplesKibana: Real-World Examples
Kibana: Real-World Examples
Salvatore Cordiano
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)
Faysal Shaarani (MBA)
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17
Daniel Eriksson
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Jonathan Katz
 
Elasticsearch for Data Engineers
Elasticsearch for Data EngineersElasticsearch for Data Engineers
Elasticsearch for Data Engineers
Duy Do
 
06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis
OpenThink Labs
 
CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29
Bilal Ahmed
 
Apache solr
Apache solrApache solr
Apache solr
Péter Király
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
David Pilato
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Holden Karau
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
Jaya Naresh Kovela
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
Karel Minarik
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
ScaleGrid.io
 
Mongo db
Mongo dbMongo db
Mongo db
Toki Kanno
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
Dan Morrill
 
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Databricks
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
Ben van Mol
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
Ramakrishna Reddy Bijjam
 
Elasticsearch for Data Analytics
Elasticsearch for Data AnalyticsElasticsearch for Data Analytics
Elasticsearch for Data Analytics
Felipe
 

Similar to Search Engine-Building with Lucene and Solr (20)

Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
 
Kibana: Real-World Examples
Kibana: Real-World ExamplesKibana: Real-World Examples
Kibana: Real-World Examples
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Elasticsearch for Data Engineers
Elasticsearch for Data EngineersElasticsearch for Data Engineers
Elasticsearch for Data Engineers
 
06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis
 
CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29
 
Apache solr
Apache solrApache solr
Apache solr
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
 
Elasticsearch for Data Analytics
Elasticsearch for Data AnalyticsElasticsearch for Data Analytics
Elasticsearch for Data Analytics
 

Recently uploaded

[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
ChristopherTHyatt
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 

Recently uploaded (20)

[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 

Search Engine-Building with Lucene and Solr

  • 1. Search Engine-Building with Lucene and Solr Kai Chan SoCal Code Camp, July 2013
  • 2. How to Search - One Approach for each document d { if (query is a substring of d's content) { add d to the list of results } } sort the result (or not)
  • 3. How to Search - Problems ● slow ○ reads the whole dataset for each search ● not scalable ○ if you dataset grows by 10x, your search slows down by 10x ● how to show the most relevant documents first? ○ list of results can be quite long ○ users have limited time and patience
  • 4. Inverted Index - Introduction ● like the "index" at the end of books ● a map of one of the following types ○ term → document list ○ term → <document, position> list
  • 5. documents: T[0] = "it is what it is" T[1] = "what is it" T[2] = "it is a banana" inverted index (without positions): "a": {2} "banana": {2} "is": {0, 1, 2} "it": {0, 1, 2} "what": {0, 1} inverted index (with positions): "a": {(2, 2)} "banana": {(2, 3)} "is": {(0, 1), (0, 4), (1, 1), (2, 1)} "it": {(0, 0), (0, 3), (1, 2), (2, 0)} "what": {(0, 2), (1, 0)} Credit: Wikipedia (http://en.wikipedia.org/wiki/Inverted_index)
  • 6. Inverted Index - Speed ● term list ○ typically very small ○ grows slowly ● term lookup ○ O(1) to O(log(number of terms)) ● for a particular term ○ document lists: very small ○ document + position lists: still small ● few terms per query
  • 7. Inverted Index - Relevance ● information in the index enables: ○ determination (scoring) of relevance of each document to the query ○ comparison of relevance among documents ○ sorting by (decreasing) relevance ■ i.e. the most relevant document first
  • 8. Lucene v.s. Solr - Lucene ● full-text search library ● creates, updates and read from the index ● takes queries and produces search results ● your application creates objects and calls methods in the Lucene API ● provides building blocks for custom features
  • 9. Lucene v.s. Solr - Solr ● full-text search server ● uses Lucene for indexing and search ● REST-like API over HTTP ● different output formats (e.g. XML, JSON) ● provides some features not built into Lucene
  • 10. machine running Java VM your application machine running Java VM servlet container (e.g. Tomcat, Jetty) Solr Solr code Lucene code libraries index Lucene Lucene code index libraries client HTTP Lucene: Solr:
  • 13. Workflow - Setup ● servlet configuration ○ e.g. port number, max POST size ○ you can usually use the default settings ● Solr configuration ○ e.g. data directory, deduplication, language identification, highlighting ○ you can usually use the default settings ● schema definition ○ defines fields in your documents ○ you can use the default settings if you name your fields in a certain way
  • 14. How Data Are Organized collection document document document field field field field field field field field field
  • 15. field content (e.g. "please read" or 30) name (e.g. "title" or "price") type options
  • 17. index document document document subject date from title SKU price last name phone text description first name address
  • 18. Solr Field Definition ● field ○ name (e.g. "subject") ○ type (e.g. "text_general") ○ options (e.g. indexed="true" stored="true") ● field type ○ text: "string", "text_general" ○ numeric: "int", "long", "float", "double" ● options ○ indexed: content can be searched ○ stored: content can be returned at search-time ○ multivalued: multiple values per field & document
  • 19. Solr Dynamic Field ● define field by naming convention ● "amount_i": int, index, stored ● "tag_ss": string, indexed, stored, multivalued name type indexed stored multiValued *_i int true true false *_l long true true false *_f float true true false *_d double true true false *_s string true true false *_ss string true true true *_t text_general true true false *_txt text_general true true true
  • 20. Solr Copy Field ● copy one or more fields into another field ● can be used to define a catch-all field ○ source: "title", "author", "description" ○ destination: "text" ○ searching the "text" field has the effect of searching all the other three fields
  • 22. Indexing - UpdateRequestHandler ● upload content or file to http://host: port/solr/update ● formats: XML, JSON, CSV
  • 23. XML: <add> <doc> <field name="id">apple</field> <field name="compName">Apple</field> <field name="address">1 Infinite Way, Cupertino CA</field> </doc> <doc> <field name="id">asus</field> <field name="compName">ASUS Computer</field> <field name="address">800 Corporate Way Fremont, CA 94539</field> </doc> </add> CSV: id,compName_s,address_s apple,Apple,"1 Infinite Way, Cupertino CA" asus,Asus Computer,"800 Corporate Way Fremont, CA 94539" JSON: [ {"id":"apple","compName_s":"Apple","address_s":"1 Infinite Way, Cupertino CA"} {"id":"asus","compName_s":"Asus Computer","address_s":"800 Corporate Way Fremont, CA 94539"} ]
  • 24. Indexing - DataImportHandler ● has its own config file (data-config.xml) ● import data from various sources ○ RDBMS (JDBC) ○ e-mail (IMAP) ○ XML data locally (file) or remotely (HTTP) ● transformers ○ extract data (RegEx, XPath) ○ manipulate data (strip HTML tags)
  • 26. Searching - Basics ● send request to http://host:port/solr/search ● parameters ○ q - main query ○ fq - filter query ○ defType - query parser (e.g. lucene, edismax) ○ fl - fields to return ○ sort - sort criteria ○ wt - response writer (e.g. xml, json) ○ indent - set to true for pretty-printing
  • 28. Searching - Query Syntax - Field ● search a specific field ○ field_name:value ● if field omitted, Solr uses default field: ○ df parameter in URL ○ defaultSearchField setting in schema.xml ○ "text"
  • 29. Searching - Query Syntax - Term ● a term by itself: matches documents that contain that term ○ e.g. tablet
  • 30. Searching - Query Syntax - Boolean ● boolean operators are supported ○ AND && ○ OR || ○ NOT ! ● e.g. a AND b ○ all of a, b must occur ● e.g. a OR b ○ at least one of a, b must occur ● e.g. a AND NOT b ○ a must occur and b must not occur
  • 31. Searching - Query Syntax - Boolean ● Lucene/Solr's boolean operators are not true boolean operators ● e.g. a OR b OR c does not behave like (a OR b) OR c ○ instead, a OR b OR c means at least one of a, b, c must occur ● parentheses are supported
  • 32. Searching - Query Syntax - Boolean ● "+" prefix means "must" ● "-" prefix means "must not" ● no prefix means "at least one must" (by default) ○ e.g. a b c ■ at least one of a, b, c must occur ● operators can mix ○ e.g. +a b c d -e ■ a must occur ■ at least one of b, c, d must occur ■ e must not occur
  • 33. Searching - Query Syntax - Phrase ● phrases are enclosed by double-quotes ● e.g. +"the phrase" ○ the phrase must occur ● e.g. -"the phrase" ○ the phrase must not occur
  • 34. Searching - Query Syntax - Boost ● manually assign different weights to clauses ● gives more weight to a field ○ e.g. title:a^10 body:a ● gives more weight to a word ○ e.g. title:a title:b^10 ● gives phrases more weight than words ○ e.g. title:(+a +b) title:"a b"^10
  • 35. Searching - Query Syntax - Range ● matches field values within a range ○ inclusive range - denoted by square brackets ○ exclusive range - denoted by curly brackets ● e.g. age:[10 TO 20] ○ matches the field "age" with the value in 10..20 ● string or numeric comparison, depending on the field's type
  • 36. Searching - Query Syntax - EDisMax ● suitable for user-generated queries ○ supports a subset of Lucene QP's syntax ○ does not complain about the syntax ○ searches for individual words across several fields ("disjunction") ○ uses max score of a word in all fields for scoring ("max") ● configurable (in solrconfig.xml) ○ what fields to search the words in ○ weighting of these fields
  • 37. Sorting ● default: sorting by decreasing score ● sorting by field: using the sort parameter ○ specify field name and order ■ price asc - sort by "price" field, ascending ■ price desc - sort by "price" field, descending ○ multiple fields and orders by comma ■ starRating desc, price asc - sort by "starRating" field, descending, and then by "price" field, ascending ○ cannot use multivalued fields ○ overrides sorting by decreasing relevance
  • 38. Faceted Search ● facet values: (distinct) values (generally non- overlapping) ranges of a field ● displaying facets ○ show possible values ○ let users narrow down their searches easily
  • 40. Faceted Search ● set facet parameter to true - enables faceting ● other parameters ○ facet.field - use the field's values as facets ■ return <value, count> pairs ○ facet.query - use the given queries as facets ■ return <query, count> pairs ○ facet.sort - set the ordering of the facets; ■ can be "count" or "index" ○ facet.offset and face.limit - used for pagination of facets
  • 41. Resources - Books ● Lucene in Action ○ written by 3 committer and PMC members ○ somewhat outdated (2010; covers Lucene 3.0) ○ http://www.manning.com/hatcher3/ ● Solr in Action ○ early access; coming out later this year ○ http://www.manning.com/grainger/ ● Apache Solr 4 Cookbook ○ common problems and useful tips ○ http://www.packtpub.com/apache-solr-4- cookbook/book
  • 42. Resources - Books ● Introduction to Information Retrieval ○ not specific to Lucene/Solr, but about IR concepts ○ free e-book ○ http://nlp.stanford.edu/IR-book/ ● Managing Gigabytes ○ indexing, compression and other topics ○ accompanied by MG4J - a full-text search software ○ http://mg4j.di.unimi.it/
  • 43. Resources - Web ● official websites ○ Lucene Core - http://lucene.apache.org/core/ ○ Solr - http://lucene.apache.org/solr/ ● mailing lists ● Wiki sites ○ Lucene Core - http://wiki.apache.org/lucene-java/ ○ Solr - http://wiki.apache.org/solr/ ● reference guides ○ API Documentation for Lucene and Solr ○ Apache Solr Reference Guide (LucidWorks) - http: //lucene.apache.org/solr/tutorial.html
  • 44. Getting Started ● download Solr ○ requires Java 6 or newer to run ● Solr comes bundled and configured with Jetty ○ <Solr directory>/example/start.jar ● "exampledocs" directory contains sample documents ○ <Solr directory>/example/exampledocs/post.jar ● use the Solr admin interface ○ http://localhost:8983/solr/