Your SlideShare is downloading. ×
0
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Dev8d Apache Solr Tutorial

3,645

Published on

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,645
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
170
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. <ul>Small wins In a small time with Apache Solr = Upayavira = </ul>
  • 2. <ul>Who am I? </ul><ul><li>My (Buddhist) name is Upayavira
  • 3. Consultant with Sourcesense, specialising in search and operational technologies
  • 4. A member of the Apache Software Foundation </li></ul>
  • 5. <ul>Who are Sourcesense? </ul><ul><li>Open Source integrator, specialising in: </li></ul><ul><ul><ul><li>Search
  • 6. Business Intelligence
  • 7. Content Management
  • 8. Application Lifecycle Management </li></ul></ul></ul><ul><li>Offices in London, Amsterdam, Milan and Rome </li></ul>
  • 9. <ul>Committers and Contributors </ul><ul><li>Search: </li></ul><ul><ul><ul><li>Lucene/Solr – contributor
  • 10. Hibernate Search – committer
  • 11. Lucene Infinispan integration – lead developer
  • 12. Apache UIMA – committer </li></ul></ul></ul><ul><li>CMS: </li></ul><ul><ul><ul><li>Apache Chemistry – contributor
  • 13. Apache Jackrabbit – contributor
  • 14. JBoss GateIn Portal – committer
  • 15. OpenSSO-Alfresco - contributor </li></ul></ul></ul>
  • 16. <ul>What is Lucene? </ul><ul><li>Lucene is a Java information retrieval library
  • 17. Provides free text search facilities
  • 18. Started in 2000, by Doug Cutting
  • 19. A project of the Apache Software Foundation
  • 20. It is designed to be embedded in Java apps </li></ul>
  • 21. <ul>What is Solr? </ul><ul><li>Solr is an enterprise search server based on Lucene
  • 22. Wraps Lucene with a RESTful web interface
  • 23. Provides configurable schema
  • 24. Provides replication functionality </li></ul>
  • 25. <ul>Solr Design </ul>Solr instance <ul>UpdateRequestHandler </ul><ul>SearchHandler </ul>User queries <ul>Lucene <li>index </li></ul><ul>content <li>application </li></ul>
  • 26. <ul>Prerequisites </ul><ul><li>Java, preferably Java 6
  • 27. Latest Apache Solr, currently 3.3
  • 28. http://www.sourcesense.com/dev8d-solr.zip </li></ul>
  • 29. <ul>Prerequisites </ul><ul><li>Extract your Solr distribution
  • 30. At a command prompt: </li><ul><li>cd into the unzipped distribution directory
  • 31. cd into the example directory
  • 32. Enter: java -jar start.jar </li></ul><li>Visit http://localhost:8983/solr/ in a browser. If you see a welcome message, your Solr works
  • 33. Unpack your dev8d-solr.zip file
  • 34. At another command prompt, cd into your dev8d-solr directory </li></ul>
  • 35. <ul>Checking Solr Works </ul><ul><li>Visit http://localhost:8983/solr/admin /
  • 36. You should see the Solr admin page.
  • 37. Click statistics link
  • 38. You'll see NumDocs: 0
  • 39. There's nothing in the index, so searches won't show much
  • 40. So we need to index some sample content </li></ul>
  • 41. <ul>Indexing Sample Content </ul><ul><li>In your dev8d-solr directory (extracted from the zip), at a command prompt:
  • 42. Java -jar post.jar wikipedia-basic.xml </li></ul>
  • 43. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=*:* </li></ul>
  • 44. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers </li></ul>
  • 45. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computer systems </li></ul>
  • 46. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers OR systems </li></ul>
  • 47. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers AND systems </li></ul>
  • 48. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=&quot;computer systems&quot; </li></ul>
  • 49. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=&quot;computer systems&quot;~10 </li></ul>
  • 50. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers NOT data </li></ul>
  • 51. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select?q=computers -data </li></ul>
  • 52. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&fl=title </li></ul>
  • 53. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&fq=author:yobot </li></ul>
  • 54. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author </li></ul>
  • 55. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&rows=10&start=10&fl=title </li></ul>
  • 56. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=title:system&fl=title </li></ul>
  • 57. <ul>Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&fl=title,author&sort=author+desc </li></ul>
  • 58. <ul>Advanced Searching </ul><ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author </li></ul>
  • 59. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author </li></ul><ul>Advanced Searching </ul>
  • 60. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=lex </li></ul><ul>Advanced Searching </ul>
  • 61. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count </li></ul><ul>Advanced Searching </ul>
  • 62. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.mincount=2 </li></ul><ul>Advanced Searching </ul>
  • 63. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3 </li></ul><ul>Advanced Searching </ul>
  • 64. <ul><li>http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3&debugQuery=true </li></ul><ul>Advanced Searching </ul>
  • 65. <ul><li>http://localhost:8983/solr/select?q=computer&wt=json </li></ul><ul>Advanced Searching </ul>
  • 66. <ul><li>http://localhost:8983/solr/select?q=computer&wt=javabin </li></ul><ul>Advanced Searching </ul>
  • 67. <ul><li>http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text </li></ul><ul>Advanced Searching </ul>
  • 68. <ul><li>Look for list after main responses
  • 69. Nothing there.
  • 70. Edit 'text' field in schema.xml, changing it to stored=”true”
  • 71. Reindex (java -jar post.jar wikipedia-enhanced.xml) </li></ul><ul>Advanced Searching </ul>
  • 72. <ul><li>http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text
  • 73. You should now see highlighted content </li></ul><ul>Advanced Searching </ul>
  • 74. <ul><li>http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text& hl.simple.pre=<b>&hl.simple.post=</b> </li></ul><ul>Advanced Searching </ul>
  • 75. <ul>Indexing </ul>
  • 76. <ul>Indexing </ul><ul><li>Load wikipedia-basic.xml into a text editor or web browser
  • 77. Load wikipedia-enhanced.xml into a text editor or browser
  • 78. Load example/solr/conf/schema.xml into a text editor </li></ul>
  • 79. <ul>Indexing </ul><ul><li>schema.xml defines field types and fields used in Solr
  • 80. Equivalent to your database schema in a RDBMS </li></ul>
  • 81. <ul>Indexing </ul><ul><li>Change this field in schema.xml to be of type “string” and add multiValued=”true” for each. <field name=&quot;category&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;true&quot; multiValued=&quot;true&quot;/> </li></ul>
  • 82. <ul>Indexing </ul><ul><li>Now add this to the <fields> section of solrconfig.xml:
  • 83. <field name=&quot;source&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;true&quot; multiValued=&quot;false&quot;/>
  • 84. <field name=&quot;text_general&quot; type=&quot;text_general&quot; indexed=&quot;true&quot; stored=&quot;true&quot; multiValued=&quot;true&quot;/>
  • 85. Now search for the “text_general” field type definition, further up in the file. </li></ul>
  • 86. <ul>Indexing </ul><ul><li>At the bottom of solrconfig.xml add the following:
  • 87. <copyField source=&quot;text&quot; dest=&quot;text_general&quot;/> </li></ul>
  • 88. <ul>Indexing </ul><ul><li>In your window where Solr is running, press CTRL+C to stop Solr, and then restart it with:
  • 89. java -jar start.jar </li></ul>
  • 90. <ul>Indexing </ul><ul><li>At your command prompt, in the dev8d directory, execute:
  • 91. java -jar post.jar wikipedia-enhanced.xml </li></ul>
  • 92. <ul>More Advanced Searching </ul><ul><li>http://localhost:8983/solr/select?q=computer%20AND%20babbage&facet=true&facet.field=category&facet.mincount=1 </li></ul>
  • 93. <ul>More Advanced Searching </ul><ul><li>http://localhost:8983/solr/terms?terms.fl=text&terms=true&terms.limit=20 </li></ul>
  • 94. <ul>More Advanced Searching </ul><ul><li>http://localhost:8983/solr/terms?terms.fl=text_general&terms=true&terms.limit=20 </li></ul>
  • 95. <ul>More Advanced Searching </ul><ul><li>http://localhost:8983/solr/terms?terms.fl=text_general&terms=true&terms.limit=20&terms.prefix=at </li></ul>
  • 96. <ul>Indexing </ul><ul><li>Index segmentation: merge factor
  • 97. Index optimisation: <optimize/> </li></ul>
  • 98. <ul>schema.xml </ul><ul><li>Equivalent to RDBMS schema
  • 99. Seen it before!
  • 100. Let's look through it in more detail... </li></ul>
  • 101. <ul>solrconfig.xml </ul><ul><li>Configures the components available to a Solr system
  • 102. Specific to a Solr 'core', as is schema.xml
  • 103. In same directory as schema.xml
  • 104. Let's look through it in more detail... </li></ul>
  • 105. <ul>Hints and Tips </ul>
  • 106. <ul>Hints and Tips: Prototyping </ul><ul><li>Velocity response writer (/browse)
  • 107. Data Import Handler (DIH)
  • 108. XSLTUpdateRequestHandler (Solr 3.4) </li></ul>
  • 109. <ul>Hints and Tips: Architecture </ul><ul><li>A RESTful service
  • 110. An index, not a data store: keep ability to re-index
  • 111. Don't make Solr do things you wouldn't have MySQL do </li></ul>
  • 112. <ul>Hints and Tips: Security </ul><ul><li>There is none
  • 113. So use a firewall
  • 114. Beware what Solr internals you expose: </li><ul><li>Query syntax
  • 115. qt= parameter (e.g. qt=update) </li></ul><li>Fake document level security with role fields and filter queries </li></ul>
  • 116. <ul>Hints and Tips: Scaling </ul><ul><li>Index too large: distributed search
  • 117. Too much traffic: replicated search
  • 118. How much is too much: unanswerable! </li></ul>
  • 119. <ul>Time for Questions </ul><ul><li>And your questions are... </li></ul>
  • 120. <ul>thank you [email_address] </ul>
  • 121. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>searches </ul>
  • 122. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul>
  • 123. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul><ul>load balancer </ul>
  • 124. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul><ul>load balancer </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul>
  • 125. <ul>Solr Host Configuration </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul><ul>load balancer </ul><ul>shard 1 </ul><ul>shard 2 </ul><ul>shard 3 </ul><ul>co-ordinator </ul>

×