Using Apache Solr

18,918 views
18,685 views

Published on

intro to full text search solution, Apache Solr

Published in: Technology
2 Comments
25 Likes
Statistics
Notes
  • Open SOLR has just launched version 1 beta of the SOLR Manager. Main features are:
    Choose your own server or one of our servers around the world for your solr index. Host unlimited SOLR Collections in our cloud Add your own SOLR servers and manage them through our SOLR Manager® Manage IP access rules for each collection (core) individually. Keep your servers secure by adding them to our SOLR Manager® Automatic configuration installer for Squid, iptables and more for your servers. Create web SOLR collections that also allow you to crawl any entire website in just minutes with a few simple clicks.
    http://opensolr.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Check indekspot.com if you are searching for trouble free Apache Solr hosting.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
18,918
On SlideShare
0
From Embeds
0
Number of Embeds
106
Actions
Shares
0
Downloads
652
Comments
2
Likes
25
Embeds 0
No embeds

No notes for slide

Using Apache Solr

  1. 1. Full Text Search with Apache Solr Pittaya Sroilong pittaya@gmail.com
  2. 2. Who am I?
  3. 3. Solr?
  4. 4. Not her!
  5. 5. But a search server
  6. 6. based on Lucene
  7. 7. Lucene?
  8. 8. Full-text search library
  9. 9. 100% java :-(
  10. 10. Solr is based on Lucene
  11. 11. XML/HTTP, JSON interface
  12. 12. Open Source
  13. 13. Shield us from using Java :-)
  14. 14. Who use Solr/Lucene?
  15. 15. Who use Solr/Lucene?
  16. 16. What is our problem?
  17. 17. How do we implement this?
  18. 18. SELECT * FROM post WHERE topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’ ORDER BY id DESC
  19. 19. SELECT * FROM post WHERE (topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’) OR (topic LIKE ‘%miyabi%’ OR author LIKE ‘%miyabi%’) ORDER BY id DESC
  20. 20. Full table scan = Performance killer
  21. 21. No search scoring
  22. 22. RDBMS isn’t designed to do this
  23. 23. Use the right tool!
  24. 24. Indexer Update index Query Solr Web App Lucene Result
  25. 25. 1
  26. 26. De ne schema.xml
  27. 27. <field name=quot;idquot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;fullnamequot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;positionquot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;tagquot; type=quot;stringiquot; indexed=quot;truequot; stored=quot;truequot; multiValued=quot;truequot; />
  28. 28. 2
  29. 29. Deploy on any J2EE container
  30. 30. Tomcat, Jetty, etc.
  31. 31. 3
  32. 32. Index documents
  33. 33. Document format <add><doc> <field name=”id”>555</field> <field name=”fullname”>Kaka</field> <field name=”position”>Midfielder</field> <field name=”tag”>AC Milan</field> <field name=”tag”>Brazil</field> </doc></add>
  34. 34. Post to Solr http://<host>/solr/update
  35. 35. Any language that can do HTTP POST
  36. 36. PHP, Perl, Python
  37. 37. cURL
  38. 38. Commit <commit />
  39. 39. 4
  40. 40. Search
  41. 41. Query from http://<host>/solr/select
  42. 42. Use Solr query syntax
  43. 43. http://<host>/solr/select? q=tag:madrid&start=0&rows =2& =fullname,position,tag
  44. 44. Response in XML or JSON (con gurable)
  45. 45. <response> <result numFound=”46” start=”0”> <doc> <str name=”fullname”>Sergio Ramos</str> <str name=”position”>Defender</str> <str name=”tag”>Real Madrid</str> <str name=”tag”>Spain</str> </doc> <doc> <str name=”fullname”>Diego Forlan</str> <str name=”position”>Striker</str> <str name=”tag”>Atletico Madrid</str> <str name=”tag”>Uruguay</str> </doc> </result> </response>
  46. 46. &wt=json
  47. 47. { “result”: { “numFound”: 46, “start”: 0, “docs” : [ { “fullname”: “Sergio Ramos”, “position”: “Defender”, “tag”: [“Real Madrid”, “Spain”] }, { “fullname”: “Diego Forlan”, “position”: “Striker”, “tag”: [“Atletico Madrid”, “Uruguay”] } ] } }
  48. 48. Query examples
  49. 49. • David Pizzarro • Equiv: David OR Pizzarro • Default operator is “OR” (con gurable) • Result: David Villa, David Pizzarro, Claudio Pizzarro, David Seaman
  50. 50. • +David +tag:Roma • Equiv: David AND tag:Roma • Result: David Pizzarro
  51. 51. • +David +position:(Striker OR Mid elder) • Result: David Villa, David Pizzarro
  52. 52. Updating
  53. 53. Post new document to http://<host>/solr/update
  54. 54. Deleting
  55. 55. <delete> <id>345</id> </delete>
  56. 56. <delete> <query>tag:Brazil</query> </delete>
  57. 57. <delete> <query>*:*</query> </delete>
  58. 58. Thai support
  59. 59. fwdder.com
  60. 60. Sharing forward mails
  61. 61. Use customized eld in schema.xml
  62. 62. <fieldType name=quot;html_thquot; class=quot;solr.TextFieldquot; positionIncrementGap=quot;100quot;> <analyzer type=quot;indexquot;> <tokenizer class=quot;solr.HTMLStripStandardTokenizerFactoryquot;/> <filter class=quot;solr.ThaiWordFilterFactoryquot; /> <filter class=quot;solr.StopFilterFactoryquot; ignoreCase=quot;truequot; words=quot;stopwords.txtquot;/> <filter class=quot;solr.LowerCaseFilterFactoryquot;/> <filter class=quot;solr.EnglishPorterFilterFactoryquot; protected=quot;protwords.txtquot;/> <filter class=quot;solr.RemoveDuplicatesTokenFilterFactoryquot;/> </analyzer> </fieldType>
  63. 63. <field name=quot;idquot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;titlequot; type=quot;html_thquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;detailquot; type=quot;html_thquot; indexed=quot;truequot; stored=quot;truequot; /> <field name=quot;tagquot; type=quot;stringiquot; indexed=quot;truequot; stored=quot;truequot; multiValued=quot;truequot; /> <field name=quot;useridquot; type=quot;integerquot; indexed=quot;falsequot; stored=quot;truequot; />
  64. 64. Index analyzer
  65. 65. Debugging
  66. 66. &debugQuery=on
  67. 67. Further readings • http://lucene.apache.org/solr/ • http://wiki.apache.org/solr • http://www.xml.com/pub/a/2006/08/09/ solr-indexing-xml-with-lucene- andrest.html • http://lucene.apache.org/java/docs/ scoring.html
  68. 68. Q&A

×