Slideshow transcript
Slide 1: Full Text Search with Apache Solr Pittaya Sroilong pittaya@gmail.com
Slide 2: Who am I?
Slide 4: Solr?
Slide 6: Not her!
Slide 7: But a search server
Slide 8: based on Lucene
Slide 9: Lucene?
Slide 10: Full-text search library
Slide 11: 100% java :-(
Slide 12: Solr is based on Lucene
Slide 13: XML/HTTP, JSON interface
Slide 14: Open Source
Slide 15: Shield us from using Java :-)
Slide 16: Who use Solr/Lucene?
Slide 17: Who use Solr/Lucene?
Slide 18: What is our problem?
Slide 19: How do we implement this?
Slide 20: SELECT * FROM post WHERE topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’ ORDER BY id DESC
Slide 21: SELECT * FROM post WHERE (topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’) OR (topic LIKE ‘%miyabi%’ OR author LIKE ‘%miyabi%’) ORDER BY id DESC
Slide 22: Full table scan = Performance killer
Slide 23: No search scoring
Slide 24: RDBMS isn’t designed to do this
Slide 25: Use the right tool!
Slide 26: Indexer Update index Query Solr Web App Lucene Result
Slide 27: 1
Slide 28: De ne schema.xml
Slide 29: <field name=\"id\" type=\"string\" indexed=\"true\" stored=\"true\" /> <field name=\"fullname\" type=\"string\" indexed=\"true\" stored=\"true\" /> <field name=\"position\" type=\"string\" indexed=\"true\" stored=\"true\" /> <field name=\"tag\" type=\"stringi\" indexed=\"true\" stored=\"true\" multiValued=\"true\" />
Slide 30: 2
Slide 31: Deploy on any J2EE container
Slide 32: Tomcat, Jetty, etc.
Slide 33: 3
Slide 34: Index documents
Slide 35: Document format <add><doc> <field name=”id”>555</field> <field name=”fullname”>Kaka</field> <field name=”position”>Midfielder</field> <field name=”tag”>AC Milan</field> <field name=”tag”>Brazil</field> </doc></add>
Slide 36: Post to Solr http://<host>/solr/update
Slide 37: Any language that can do HTTP POST
Slide 38: PHP, Perl, Python
Slide 39: cURL
Slide 40: Commit <commit />
Slide 41: 4
Slide 42: Search
Slide 43: Query from http://<host>/solr/select
Slide 44: Use Solr query syntax
Slide 45: http://<host>/solr/select? q=tag:madrid&start=0&rows =2& =fullname,position,tag
Slide 46: Response in XML or JSON (con gurable)
Slide 47: <response> <result numFound=”46” start=”0”> <doc> <str name=”fullname”>Sergio Ramos</str> <str name=”position”>Defender</str> <str name=”tag”>Real Madrid</str> <str name=”tag”>Spain</str> </doc> <doc> <str name=”fullname”>Diego Forlan</str> <str name=”position”>Striker</str> <str name=”tag”>Atletico Madrid</str> <str name=”tag”>Uruguay</str> </doc> </result> </response>
Slide 48: &wt=json
Slide 49: { “result”: { “numFound”: 46, “start”: 0, “docs” : [ { “fullname”: “Sergio Ramos”, “position”: “Defender”, “tag”: [“Real Madrid”, “Spain”] }, { “fullname”: “Diego Forlan”, “position”: “Striker”, “tag”: [“Atletico Madrid”, “Uruguay”] } ] } }
Slide 50: Query examples
Slide 51: • David Pizzarro • Equiv: David OR Pizzarro • Default operator is “OR” (con gurable) • Result: David Villa, David Pizzarro, Claudio Pizzarro, David Seaman
Slide 52: • +David +tag:Roma • Equiv: David AND tag:Roma • Result: David Pizzarro
Slide 53: • +David +position:(Striker OR Mid elder) • Result: David Villa, David Pizzarro
Slide 54: Updating
Slide 55: Post new document to http://<host>/solr/update
Slide 56: Deleting
Slide 57: <delete> <id>345</id> </delete>
Slide 58: <delete> <query>tag:Brazil</query> </delete>
Slide 59: <delete> <query>*:*</query> </delete>
Slide 60: Thai support
Slide 61: fwdder.com
Slide 62: Sharing forward mails
Slide 65: Use customized eld in schema.xml
Slide 66: <fieldType name=\"html_th\" class=\"solr.TextField\" positionIncrementGap=\"100\"> <analyzer type=\"index\"> <tokenizer class=\"solr.HTMLStripStandardTokenizerFactory\"/> <filter class=\"solr.ThaiWordFilterFactory\" /> <filter class=\"solr.StopFilterFactory\" ignoreCase=\"true\" words=\"stopwords.txt\"/> <filter class=\"solr.LowerCaseFilterFactory\"/> <filter class=\"solr.EnglishPorterFilterFactory\" protected=\"protwords.txt\"/> <filter class=\"solr.RemoveDuplicatesTokenFilterFactory\"/> </analyzer> </fieldType>
Slide 67: <field name=\"id\" type=\"string\" indexed=\"true\" stored=\"true\" /> <field name=\"title\" type=\"html_th\" indexed=\"true\" stored=\"true\" /> <field name=\"detail\" type=\"html_th\" indexed=\"true\" stored=\"true\" /> <field name=\"tag\" type=\"stringi\" indexed=\"true\" stored=\"true\" multiValued=\"true\" /> <field name=\"userid\" type=\"integer\" indexed=\"false\" stored=\"true\" />
Slide 68: Index analyzer
Slide 69: Debugging
Slide 70: &debugQuery=on
Slide 71: Further readings • http://lucene.apache.org/solr/ • http://wiki.apache.org/solr • http://www.xml.com/pub/a/2006/08/09/ solr-indexing-xml-with-lucene- andrest.html • http://lucene.apache.org/java/docs/ scoring.html
Slide 72: Q&A



Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 1 (more)