Slideshare.net (beta)

 
Post to TwitterPost to Twitter
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons

All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 1 (more)

Using Apache Solr

From pittaya, 4 months ago

intro to full text search solution, Apache Solr

1031 views  |  0 comments  |  1 favorite  |  33 downloads  |  1 embed (Stats)
 

Categories

Add Category
 
 

Groups / Events

 

 
Embed
options

More Info

This slideshow is Public
Total Views: 1031
on Slideshare: 1030
from embeds: 1

Slideshow transcript

Slide 1: Full Text Search with Apache Solr Pittaya Sroilong pittaya@gmail.com

Slide 2: Who am I?

Slide 4: Solr?

Slide 6: Not her!

Slide 7: But a search server

Slide 8: based on Lucene

Slide 9: Lucene?

Slide 10: Full-text search library

Slide 11: 100% java :-(

Slide 12: Solr is based on Lucene

Slide 13: XML/HTTP, JSON interface

Slide 14: Open Source

Slide 15: Shield us from using Java :-)

Slide 16: Who use Solr/Lucene?

Slide 17: Who use Solr/Lucene?

Slide 18: What is our problem?

Slide 19: How do we implement this?

Slide 20: SELECT * FROM post WHERE topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’ ORDER BY id DESC

Slide 21: SELECT * FROM post WHERE (topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’) OR (topic LIKE ‘%miyabi%’ OR author LIKE ‘%miyabi%’) ORDER BY id DESC

Slide 22: Full table scan = Performance killer

Slide 23: No search scoring

Slide 24: RDBMS isn’t designed to do this

Slide 25: Use the right tool!

Slide 26: Indexer Update index Query Solr Web App Lucene Result

Slide 27: 1

Slide 28: De ne schema.xml

Slide 29: <field name=\"id\" type=\"string\" indexed=\"true\" stored=\"true\" /> <field name=\"fullname\" type=\"string\" indexed=\"true\" stored=\"true\" /> <field name=\"position\" type=\"string\" indexed=\"true\" stored=\"true\" /> <field name=\"tag\" type=\"stringi\" indexed=\"true\" stored=\"true\" multiValued=\"true\" />

Slide 30: 2

Slide 31: Deploy on any J2EE container

Slide 32: Tomcat, Jetty, etc.

Slide 33: 3

Slide 34: Index documents

Slide 35: Document format <add><doc> <field name=”id”>555</field> <field name=”fullname”>Kaka</field> <field name=”position”>Midfielder</field> <field name=”tag”>AC Milan</field> <field name=”tag”>Brazil</field> </doc></add>

Slide 36: Post to Solr http://<host>/solr/update

Slide 37: Any language that can do HTTP POST

Slide 38: PHP, Perl, Python

Slide 39: cURL

Slide 40: Commit <commit />

Slide 41: 4

Slide 42: Search

Slide 43: Query from http://<host>/solr/select

Slide 44: Use Solr query syntax

Slide 45: http://<host>/solr/select? q=tag:madrid&start=0&rows =2& =fullname,position,tag

Slide 46: Response in XML or JSON (con gurable)

Slide 47: <response> <result numFound=”46” start=”0”> <doc> <str name=”fullname”>Sergio Ramos</str> <str name=”position”>Defender</str> <str name=”tag”>Real Madrid</str> <str name=”tag”>Spain</str> </doc> <doc> <str name=”fullname”>Diego Forlan</str> <str name=”position”>Striker</str> <str name=”tag”>Atletico Madrid</str> <str name=”tag”>Uruguay</str> </doc> </result> </response>

Slide 48: &wt=json

Slide 49: { “result”: { “numFound”: 46, “start”: 0, “docs” : [ { “fullname”: “Sergio Ramos”, “position”: “Defender”, “tag”: [“Real Madrid”, “Spain”] }, { “fullname”: “Diego Forlan”, “position”: “Striker”, “tag”: [“Atletico Madrid”, “Uruguay”] } ] } }

Slide 50: Query examples

Slide 51: • David Pizzarro • Equiv: David OR Pizzarro • Default operator is “OR” (con gurable) • Result: David Villa, David Pizzarro, Claudio Pizzarro, David Seaman

Slide 52: • +David +tag:Roma • Equiv: David AND tag:Roma • Result: David Pizzarro

Slide 53: • +David +position:(Striker OR Mid elder) • Result: David Villa, David Pizzarro

Slide 54: Updating

Slide 55: Post new document to http://<host>/solr/update

Slide 56: Deleting

Slide 57: <delete> <id>345</id> </delete>

Slide 58: <delete> <query>tag:Brazil</query> </delete>

Slide 59: <delete> <query>*:*</query> </delete>

Slide 60: Thai support

Slide 61: fwdder.com

Slide 62: Sharing forward mails

Slide 65: Use customized eld in schema.xml

Slide 66: <fieldType name=\"html_th\" class=\"solr.TextField\" positionIncrementGap=\"100\"> <analyzer type=\"index\"> <tokenizer class=\"solr.HTMLStripStandardTokenizerFactory\"/> <filter class=\"solr.ThaiWordFilterFactory\" /> <filter class=\"solr.StopFilterFactory\" ignoreCase=\"true\" words=\"stopwords.txt\"/> <filter class=\"solr.LowerCaseFilterFactory\"/> <filter class=\"solr.EnglishPorterFilterFactory\" protected=\"protwords.txt\"/> <filter class=\"solr.RemoveDuplicatesTokenFilterFactory\"/> </analyzer> </fieldType>

Slide 67: <field name=\"id\" type=\"string\" indexed=\"true\" stored=\"true\" /> <field name=\"title\" type=\"html_th\" indexed=\"true\" stored=\"true\" /> <field name=\"detail\" type=\"html_th\" indexed=\"true\" stored=\"true\" /> <field name=\"tag\" type=\"stringi\" indexed=\"true\" stored=\"true\" multiValued=\"true\" /> <field name=\"userid\" type=\"integer\" indexed=\"false\" stored=\"true\" />

Slide 68: Index analyzer

Slide 69: Debugging

Slide 70: &debugQuery=on

Slide 71: Further readings • http://lucene.apache.org/solr/ • http://wiki.apache.org/solr • http://www.xml.com/pub/a/2006/08/09/ solr-indexing-xml-with-lucene- andrest.html • http://lucene.apache.org/java/docs/ scoring.html

Slide 72: Q&A