Solr中国6月21日企业搜索

ENTERPRISE SEARCH
an introduction

Web Search
Desktop Search
Enterprise Search

a SOFTWARE
• that builds index on Text
• answers queries using that index

Any search application has
two major components
SEARCH component
INDEXING component
- of importance to us developers
(read headache)
- of importance to the users

data
INDEX FILES
is indexed
user
sends
search query
receives
search results
INDEXING component SEARCH component

is it easy to search here . . .

• that’s information like garbage
• no structure
• comes in all kinds of
shapes, sizes, formats

• And this is what indexing does
• Makes data accessible in a
structured format, easily accessible
through search.

so what all needs to be
Indexed and Searched ?

various FILE FORMATS
Text Files
HTML
PDF
MS Word
PPT

coming from various DATA SOURCES
Emails
CMS
File System
Database
Web Pages

data ( documents )
INDEX FILES
user
sends
search query
receives
search results
Analyzer
fed to
text that should be indexed
removing stop words such as "a" or "the"
converting all text to lowercase letters
for case-insensitive searching
Stemming
(A stemming algorithm reduces
the words "fishing", "fished",
"fish", and "fisher" to the root word, "fish". )-
Index Writer
tokenized text

Document 1:
Coffee isn't my cup of tea.
Document 2:
Chocolate, men, coffee - some things are better rich.
INDEX
coffee - 1,2
cup - 1
tea - 1
chocolate - 1
men - 1
things - 1
better - 1
rich - 1

data
INDEX FILES
is indexed
user
receives
search results
sends
search query
search terms

Search Request Terms
Taxonomy
Spelling Index
Correct Search Terms + Incorrect Search Terms
Search Terms +
Related Terms from Taxonomy + Concept IDs
Search engine
(INDEX)
Search results with
1) Actual Location of the result
2) Rank
3) Details
4) Facet Categorization
Results’ Page

Full-text search library
Open Source
Documents in xml format
Can operate on its own or via Solr

Ways of storing fields of any document:
Indexed means it is searchable
Stored you may chose not to make a field searchable, means the content can be
displayed in the search results. Example : “summary associated with a page”
Tokenized means it is run through an Analyzer, that converts the content into
a sequence of tokens

introducing
SOLR
Solr
Solr
Lucene
Index

• open source
• handles index/Query to Lucene via HTTP and XML
( also JSON )
• manages document update, add and delete
requests to Lucene
• straightforward schema and config files
• comprehensive HTML Admin Interfaces
• highly configurable

HTTP POST to /update
<add><doc boost=“2”>
<field name=“type”>05991</field>
<field name=“from”>Apache Solr</field>
<field name=“subject”>An intro...</field>
<field name=“category”>search</field>
<field name=“category”>lucene</field>
<field name=“body”>Solr is a full...</field>
</doc></add>

Schema.xml
field indexing and display definition

Solrconfig.xml file
defines cache size, faceted field type, request handler customization

Deleting Documents
• Delete by Id
<delete><id>05591</id></delete>
• Delete by Query (multiple documents)
<delete>
<query>manufacturer:microsoft</query>
</delete>

Search Results
http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price

Default Parameters
param default description
q The query
start 0 Offset into the list of matches
rows 10 Number of documents to return
fl * Stored fields to return
qt standard Query type; maps to query handler
df (schema) Default field to search
http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price

<response><responseHeader><status>0</status>
<QTime>1</QTime></responseHeader>
<result numFound="16173" start="0">
<doc>
<str name="name">Apple 60 GB iPod with Video</str>
<float name="price">399.0</float>
</doc>
<doc>
<str name="name">ASUS Extreme N7800GTX/2DHTV</str>
<float name="price">479.95</float>
</doc>
</result>
</response>

Solr Core
Lucene
Admin
Interface
Standard
Request
Handler
Disjunction
Max
Request
Handler
Custom
Request
Handler
Update
Handler
Caching
XML
Update
Interface
Config
Analysis
HTTP Request Servlet
Concurrency
Update Servlet
XML
Response
Writer
Replication
Schema
Search Requests hit here New document to be added here

FAQ
WebSite:www.solr.cc
QQGroup: 187670960

Solr中国6月21日企业搜索

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Solr中国6月21日企业搜索

Similar to Solr中国6月21日企业搜索 (20)

Recently uploaded

Recently uploaded (20)

Solr中国6月21日企业搜索