0
<ul>Small wins In a small time with Apache Solr   =  Upayavira = </ul>
<ul>Who am I? </ul><ul><li>My (Buddhist) name is Upayavira
Consultant with Sourcesense, specialising in search and operational technologies
A member of the Apache Software Foundation </li></ul>
<ul>Who are Sourcesense? </ul><ul><li>Open Source integrator, specialising in: </li></ul><ul><ul><ul><li>Search
Business Intelligence
Content Management
Application Lifecycle Management </li></ul></ul></ul><ul><li>Offices in London, Amsterdam, Milan and Rome </li></ul>
<ul>Committers and Contributors </ul><ul><li>Search: </li></ul><ul><ul><ul><li>Lucene/Solr – contributor
Hibernate Search – committer
Lucene Infinispan integration – lead developer
Apache UIMA – committer </li></ul></ul></ul><ul><li>CMS: </li></ul><ul><ul><ul><li>Apache Chemistry – contributor
Apache Jackrabbit – contributor
JBoss GateIn Portal – committer
OpenSSO-Alfresco - contributor </li></ul></ul></ul>
<ul>What is Lucene? </ul><ul><li>Lucene is a Java information retrieval library
Provides free text search facilities
Started in 2000, by Doug Cutting
A project of the Apache Software Foundation
It is designed to be embedded in Java apps </li></ul>
<ul>What is Solr? </ul><ul><li>Solr is an enterprise search server based on Lucene
Wraps Lucene with a RESTful web interface
Provides configurable schema
Provides replication functionality </li></ul>
<ul>Solr Design </ul>Solr instance <ul>UpdateRequestHandler </ul><ul>SearchHandler </ul>User queries <ul>Lucene <li>index ...
<ul>Prerequisites </ul><ul><li>Java, preferably Java 6
Latest Apache Solr, currently 3.3
http://www.sourcesense.com/dev8d-solr.zip </li></ul>
<ul>Prerequisites </ul><ul><li>Extract your Solr distribution
At a command prompt: </li><ul><li>cd into the unzipped distribution directory
cd into the example directory
Enter: java -jar start.jar </li></ul><li>Visit http://localhost:8983/solr/ in a browser. If you see a welcome message, you...
Unpack your dev8d-solr.zip file
At another command prompt, cd into your dev8d-solr directory </li></ul>
<ul>Checking Solr Works </ul><ul><li>Visit  http://localhost:8983/solr/admin /
You should see the Solr admin page.
Click statistics link
You'll see NumDocs: 0
There's nothing in the index, so searches won't show much
So we need to index some sample content </li></ul>
<ul>Indexing Sample Content </ul><ul><li>In your dev8d-solr directory (extracted from the zip), at a command prompt:
Java -jar post.jar wikipedia-basic.xml </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=*:* </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computer systems </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers OR systems </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers AND systems </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=&quot;computer systems&quot; </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=&quot;computer systems&quot;~10 </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers NOT data </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers -data </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&fl=title </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&fq=author:yobot </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&rows=10&start=10&fl=title </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=title:system&fl=title </li></ul>
<ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&fl=title,author&sort=author+desc </li></ul>
<ul>Advanced Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author </li></ul>
<ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author </li></ul><ul>Advanced Searching </ul>
<ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=lex </li></ul><ul>A...
<ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count </li></ul><ul...
<ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.mincoun...
<ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3...
<ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3...
<ul><li>http://localhost:8983/solr/select?q=computer&wt=json </li></ul><ul>Advanced Searching </ul>
<ul><li>http://localhost:8983/solr/select?q=computer&wt=javabin </li></ul><ul>Advanced Searching </ul>
Upcoming SlideShare
Loading in...5
×

Dev8d Apache Solr Tutorial

3,666

Published on

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,666
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
171
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Transcript of "Dev8d Apache Solr Tutorial"

  1. 1. <ul>Small wins In a small time with Apache Solr = Upayavira = </ul>
  2. 2. <ul>Who am I? </ul><ul><li>My (Buddhist) name is Upayavira
  3. 3. Consultant with Sourcesense, specialising in search and operational technologies
  4. 4. A member of the Apache Software Foundation </li></ul>
  5. 5. <ul>Who are Sourcesense? </ul><ul><li>Open Source integrator, specialising in: </li></ul><ul><ul><ul><li>Search
  6. 6. Business Intelligence
  7. 7. Content Management
  8. 8. Application Lifecycle Management </li></ul></ul></ul><ul><li>Offices in London, Amsterdam, Milan and Rome </li></ul>
  9. 9. <ul>Committers and Contributors </ul><ul><li>Search: </li></ul><ul><ul><ul><li>Lucene/Solr – contributor
  10. 10. Hibernate Search – committer
  11. 11. Lucene Infinispan integration – lead developer
  12. 12. Apache UIMA – committer </li></ul></ul></ul><ul><li>CMS: </li></ul><ul><ul><ul><li>Apache Chemistry – contributor
  13. 13. Apache Jackrabbit – contributor
  14. 14. JBoss GateIn Portal – committer
  15. 15. OpenSSO-Alfresco - contributor </li></ul></ul></ul>
  16. 16. <ul>What is Lucene? </ul><ul><li>Lucene is a Java information retrieval library
  17. 17. Provides free text search facilities
  18. 18. Started in 2000, by Doug Cutting
  19. 19. A project of the Apache Software Foundation
  20. 20. It is designed to be embedded in Java apps </li></ul>
  21. 21. <ul>What is Solr? </ul><ul><li>Solr is an enterprise search server based on Lucene
  22. 22. Wraps Lucene with a RESTful web interface
  23. 23. Provides configurable schema
  24. 24. Provides replication functionality </li></ul>
  25. 25. <ul>Solr Design </ul>Solr instance <ul>UpdateRequestHandler </ul><ul>SearchHandler </ul>User queries <ul>Lucene <li>index </li></ul><ul>content <li>application </li></ul>
  26. 26. <ul>Prerequisites </ul><ul><li>Java, preferably Java 6
  27. 27. Latest Apache Solr, currently 3.3
  28. 28. http://www.sourcesense.com/dev8d-solr.zip </li></ul>
  29. 29. <ul>Prerequisites </ul><ul><li>Extract your Solr distribution
  30. 30. At a command prompt: </li><ul><li>cd into the unzipped distribution directory
  31. 31. cd into the example directory
  32. 32. Enter: java -jar start.jar </li></ul><li>Visit http://localhost:8983/solr/ in a browser. If you see a welcome message, your Solr works
  33. 33. Unpack your dev8d-solr.zip file
  34. 34. At another command prompt, cd into your dev8d-solr directory </li></ul>
  35. 35. <ul>Checking Solr Works </ul><ul><li>Visit http://localhost:8983/solr/admin /
  36. 36. You should see the Solr admin page.
  37. 37. Click statistics link
  38. 38. You'll see NumDocs: 0
  39. 39. There's nothing in the index, so searches won't show much
  40. 40. So we need to index some sample content </li></ul>
  41. 41. <ul>Indexing Sample Content </ul><ul><li>In your dev8d-solr directory (extracted from the zip), at a command prompt:
  42. 42. Java -jar post.jar wikipedia-basic.xml </li></ul>
  43. 43. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=*:* </li></ul>
  44. 44. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers </li></ul>
  45. 45. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computer systems </li></ul>
  46. 46. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers OR systems </li></ul>
  47. 47. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers AND systems </li></ul>
  48. 48. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=&quot;computer systems&quot; </li></ul>
  49. 49. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=&quot;computer systems&quot;~10 </li></ul>
  50. 50. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers NOT data </li></ul>
  51. 51. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers -data </li></ul>
  52. 52. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&fl=title </li></ul>
  53. 53. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&fq=author:yobot </li></ul>
  54. 54. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author </li></ul>
  55. 55. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&rows=10&start=10&fl=title </li></ul>
  56. 56. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=title:system&fl=title </li></ul>
  57. 57. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&fl=title,author&sort=author+desc </li></ul>
  58. 58. <ul>Advanced Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author </li></ul>
  59. 59. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author </li></ul><ul>Advanced Searching </ul>
  60. 60. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=lex </li></ul><ul>Advanced Searching </ul>
  61. 61. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count </li></ul><ul>Advanced Searching </ul>
  62. 62. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.mincount=2 </li></ul><ul>Advanced Searching </ul>
  63. 63. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3 </li></ul><ul>Advanced Searching </ul>
  64. 64. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3&debugQuery=true </li></ul><ul>Advanced Searching </ul>
  65. 65. <ul><li>http://localhost:8983/solr/select?q=computer&wt=json </li></ul><ul>Advanced Searching </ul>
  66. 66. <ul><li>http://localhost:8983/solr/select?q=computer&wt=javabin </li></ul><ul>Advanced Searching </ul>
  67. 67. <ul><li>http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text </li></ul><ul>Advanced Searching </ul>
  68. 68. <ul><li>Look for list after main responses
  69. 69. Nothing there.
  70. 70. Edit 'text' field in schema.xml, changing it to stored=”true”
  71. 71. Reindex (java -jar post.jar wikipedia-enhanced.xml) </li></ul><ul>Advanced Searching </ul>
  72. 72. <ul><li>http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text
  73. 73. You should now see highlighted content </li></ul><ul>Advanced Searching </ul>
  74. 74. <ul><li>http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text& hl.simple.pre=<b>&hl.simple.post=</b> </li></ul><ul>Advanced Searching </ul>
  75. 75. <ul>Indexing </ul>
  76. 76. <ul>Indexing </ul><ul><li>Load wikipedia-basic.xml into a text editor or web browser
  77. 77. Load wikipedia-enhanced.xml into a text editor or browser
  78. 78. Load example/solr/conf/schema.xml into a text editor </li></ul>
  79. 79. <ul>Indexing </ul><ul><li>schema.xml defines field types and fields used in Solr
  80. 80. Equivalent to your database schema in a RDBMS </li></ul>
  81. 81. <ul>Indexing </ul><ul><li>Change this field in schema.xml to be of type “string” and add multiValued=”true” for each. <field name=&quot;category&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;true&quot; multiValued=&quot;true&quot;/> </li></ul>
  82. 82. <ul>Indexing </ul><ul><li>Now add this to the <fields> section of solrconfig.xml:
  83. 83. <field name=&quot;source&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;true&quot; multiValued=&quot;false&quot;/>
  84. 84. <field name=&quot;text_general&quot; type=&quot;text_general&quot; indexed=&quot;true&quot; stored=&quot;true&quot; multiValued=&quot;true&quot;/>
  85. 85. Now search for the “text_general” field type definition, further up in the file. </li></ul>
  86. 86. <ul>Indexing </ul><ul><li>At the bottom of solrconfig.xml add the following:
  87. 87. <copyField source=&quot;text&quot; dest=&quot;text_general&quot;/> </li></ul>
  88. 88. <ul>Indexing </ul><ul><li>In your window where Solr is running, press CTRL+C to stop Solr, and then restart it with:
  89. 89. java -jar start.jar </li></ul>
  90. 90. <ul>Indexing </ul><ul><li>At your command prompt, in the dev8d directory, execute:
  91. 91. java -jar post.jar wikipedia-enhanced.xml </li></ul>
  92. 92. <ul>More Advanced Searching </ul><ul><li>http://localhost:8983/solr/select?q=computer%20AND%20babbage&facet=true&facet.field=category&facet.mincount=1 </li></ul>
  93. 93. <ul>More Advanced Searching </ul><ul><li>http://localhost:8983/solr/terms?terms.fl=text&terms=true&terms.limit=20 </li></ul>
  94. 94. <ul>More Advanced Searching </ul><ul><li>http://localhost:8983/solr/terms?terms.fl=text_general&terms=true&terms.limit=20 </li></ul>
  95. 95. <ul>More Advanced Searching </ul><ul><li>http://localhost:8983/solr/terms?terms.fl=text_general&terms=true&terms.limit=20&terms.prefix=at </li></ul>
  96. 96. <ul>Indexing </ul><ul><li>Index segmentation: merge factor
  97. 97. Index optimisation: <optimize/> </li></ul>
  98. 98. <ul>schema.xml </ul><ul><li>Equivalent to RDBMS schema
  99. 99. Seen it before!
  100. 100. Let's look through it in more detail... </li></ul>
  101. 101. <ul>solrconfig.xml </ul><ul><li>Configures the components available to a Solr system
  102. 102. Specific to a Solr 'core', as is schema.xml
  103. 103. In same directory as schema.xml
  104. 104. Let's look through it in more detail... </li></ul>
  105. 105. <ul>Hints and Tips </ul>
  106. 106. <ul>Hints and Tips: Prototyping </ul><ul><li>Velocity response writer (/browse)
  107. 107. Data Import Handler (DIH)
  108. 108. XSLTUpdateRequestHandler (Solr 3.4) </li></ul>
  109. 109. <ul>Hints and Tips: Architecture </ul><ul><li>A RESTful service
  110. 110. An index, not a data store: keep ability to re-index
  111. 111. Don't make Solr do things you wouldn't have MySQL do </li></ul>
  112. 112. <ul>Hints and Tips: Security </ul><ul><li>There is none
  113. 113. So use a firewall
  114. 114. Beware what Solr internals you expose: </li><ul><li>Query syntax
  115. 115. qt= parameter (e.g. qt=update) </li></ul><li>Fake document level security with role fields and filter queries </li></ul>
  116. 116. <ul>Hints and Tips: Scaling </ul><ul><li>Index too large: distributed search
  117. 117. Too much traffic: replicated search
  118. 118. How much is too much: unanswerable! </li></ul>
  119. 119. <ul>Time for Questions </ul><ul><li>And your questions are... </li></ul>
  120. 120. <ul>thank you [email_address] </ul>
  121. 121. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>searches </ul>
  122. 122. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul>
  123. 123. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul><ul>load balancer </ul>
  124. 124. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul><ul>load balancer </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul>
  125. 125. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul><ul>load balancer </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×