Using Apache Solr
Upcoming SlideShare
Loading in...5
×
 

Using Apache Solr

on

  • 23,566 views

intro to full text search solution, Apache Solr

intro to full text search solution, Apache Solr

Statistics

Views

Total Views
23,566
Views on SlideShare
23,469
Embed Views
97

Actions

Likes
25
Downloads
606
Comments
2

7 Embeds 97

http://www.slideshare.net 89
http://pinterest.com 3
http://www.fromlabs.com 1
http://webcache.googleusercontent.com 1
http://twitter.com 1
https://twitter.com 1
https://si0.twimg.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Open SOLR has just launched version 1 beta of the SOLR Manager. Main features are:
    Choose your own server or one of our servers around the world for your solr index. Host unlimited SOLR Collections in our cloud Add your own SOLR servers and manage them through our SOLR Manager® Manage IP access rules for each collection (core) individually. Keep your servers secure by adding them to our SOLR Manager® Automatic configuration installer for Squid, iptables and more for your servers. Create web SOLR collections that also allow you to crawl any entire website in just minutes with a few simple clicks.
    http://opensolr.com
    Are you sure you want to
    Your message goes here
    Processing…
  • Check indekspot.com if you are searching for trouble free Apache Solr hosting.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Using Apache Solr Using Apache Solr Presentation Transcript

  • Full Text Search with Apache Solr Pittaya Sroilong pittaya@gmail.com
  • Who am I?
  • Solr?
  • Not her!
  • But a search server
  • based on Lucene
  • Lucene?
  • Full-text search library
  • 100% java :-(
  • Solr is based on Lucene
  • XML/HTTP, JSON interface
  • Open Source
  • Shield us from using Java :-)
  • Who use Solr/Lucene?
  • Who use Solr/Lucene?
  • What is our problem?
  • How do we implement this?
  • SELECT * FROM post WHERE topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’ ORDER BY id DESC
  • SELECT * FROM post WHERE (topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’) OR (topic LIKE ‘%miyabi%’ OR author LIKE ‘%miyabi%’) ORDER BY id DESC
  • Full table scan = Performance killer
  • No search scoring
  • RDBMS isn’t designed to do this
  • Use the right tool!
  • Indexer Update index Query Solr Web App Lucene Result
  • 1
  • De ne schema.xml
  • <field name=quot;idquot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;fullnamequot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;positionquot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;tagquot; type=quot;stringiquot; indexed=quot;truequot; stored=quot;truequot; multiValued=quot;truequot; />
  • 2
  • Deploy on any J2EE container
  • Tomcat, Jetty, etc.
  • 3
  • Index documents
  • Document format <add><doc> <field name=”id”>555</field> <field name=”fullname”>Kaka</field> <field name=”position”>Midfielder</field> <field name=”tag”>AC Milan</field> <field name=”tag”>Brazil</field> </doc></add>
  • Post to Solr http://<host>/solr/update
  • Any language that can do HTTP POST
  • PHP, Perl, Python
  • cURL
  • Commit <commit />
  • 4
  • Search
  • Query from http://<host>/solr/select
  • Use Solr query syntax
  • http://<host>/solr/select? q=tag:madrid&start=0&rows =2& =fullname,position,tag
  • Response in XML or JSON (con gurable)
  • <response> <result numFound=”46” start=”0”> <doc> <str name=”fullname”>Sergio Ramos</str> <str name=”position”>Defender</str> <str name=”tag”>Real Madrid</str> <str name=”tag”>Spain</str> </doc> <doc> <str name=”fullname”>Diego Forlan</str> <str name=”position”>Striker</str> <str name=”tag”>Atletico Madrid</str> <str name=”tag”>Uruguay</str> </doc> </result> </response>
  • &wt=json
  • { “result”: { “numFound”: 46, “start”: 0, “docs” : [ { “fullname”: “Sergio Ramos”, “position”: “Defender”, “tag”: [“Real Madrid”, “Spain”] }, { “fullname”: “Diego Forlan”, “position”: “Striker”, “tag”: [“Atletico Madrid”, “Uruguay”] } ] } }
  • Query examples
  • • David Pizzarro • Equiv: David OR Pizzarro • Default operator is “OR” (con gurable) • Result: David Villa, David Pizzarro, Claudio Pizzarro, David Seaman
  • • +David +tag:Roma • Equiv: David AND tag:Roma • Result: David Pizzarro
  • • +David +position:(Striker OR Mid elder) • Result: David Villa, David Pizzarro
  • Updating
  • Post new document to http://<host>/solr/update
  • Deleting
  • <delete> <id>345</id> </delete>
  • <delete> <query>tag:Brazil</query> </delete>
  • <delete> <query>*:*</query> </delete>
  • Thai support
  • fwdder.com
  • Sharing forward mails
  • Use customized eld in schema.xml
  • <fieldType name=quot;html_thquot; class=quot;solr.TextFieldquot; positionIncrementGap=quot;100quot;> <analyzer type=quot;indexquot;> <tokenizer class=quot;solr.HTMLStripStandardTokenizerFactoryquot;/> <filter class=quot;solr.ThaiWordFilterFactoryquot; /> <filter class=quot;solr.StopFilterFactoryquot; ignoreCase=quot;truequot; words=quot;stopwords.txtquot;/> <filter class=quot;solr.LowerCaseFilterFactoryquot;/> <filter class=quot;solr.EnglishPorterFilterFactoryquot; protected=quot;protwords.txtquot;/> <filter class=quot;solr.RemoveDuplicatesTokenFilterFactoryquot;/> </analyzer> </fieldType>
  • <field name=quot;idquot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;titlequot; type=quot;html_thquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;detailquot; type=quot;html_thquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;tagquot; type=quot;stringiquot; indexed=quot;truequot; stored=quot;truequot; multiValued=quot;truequot; /> <field name=quot;useridquot; type=quot;integerquot; indexed=quot;falsequot; stored=quot;truequot; />
  • Index analyzer
  • Debugging
  • &debugQuery=on
  • Further readings • http://lucene.apache.org/solr/ • http://wiki.apache.org/solr • http://www.xml.com/pub/a/2006/08/09/ solr-indexing-xml-with-lucene- andrest.html • http://lucene.apache.org/java/docs/ scoring.html
  • Q&A