Small wins In a smalltime with Apache Solr
Who am I?    My (Buddhist) name is Upayavira    Consultant with Sourcesense, specialising in    search and operational t...
Who are Sourcesense?    Open Source integrator, specialising in:            Search            Business Intelligence    ...
Committers and Contributors     Search:                 Lucene/Solr – contributor                 Hibernate Search – co...
What is Lucene?    Lucene is a Java information retrieval library    Provides free text search facilities    Started in...
What is Solr?    Solr is an enterprise search server based on    Lucene    Wraps Lucene with a RESTful web interface   ...
Solr Design                                       User queries     Solr          SearchHandler     instance               ...
Prerequisites    Java, preferably Java 6    Apache Solr 1.4.1    http://www.sourcesense.com/dev8d-solr.zip
Prerequisites    Extract your Solr distribution    At a command prompt:    – cd into the unzipped distribution directory...
Checking Solr Works    Visit http://localhost:8983/solr/admin/    You should see the Solr admin page.    Click statisti...
Indexing Sample Content    In your dev8d-solr directory (extracted from the zip), at    a command prompt:    Java -jar p...
Searching    http://localhost:8983/solr/select?q=*:*
Searching    http://localhost:8983/solr/select?q=computers
Searching    http://localhost:8983/solr/select?q=computer systems
Searching     http://localhost:8983/solr/select?q=computers OR systems
Searching     http://localhost:8983/solr/select?q=computers AND systems
Searching     http://localhost:8983/solr/select?q="computer systems"
Searching     http://localhost:8983/solr/select?q="computer systems"~10
Searching     http://localhost:8983/solr/select?q=computers NOT data
Searching     http://localhost:8983/solr/select?q=computers -data
Searching     http://localhost:8983/solr/select/?q=computers&fl=title
Searching     http://localhost:8983/solr/select/?q=computers&fq=author:yobot
Searching     http://localhost:8983/solr/select/?     q=computers&fq=author:yobot&fl=title,author
Searching     http://localhost:8983/solr/select/?     q=computers&rows=10&start=10&fl=title
Searching     http://localhost:8983/solr/select/?q=title:system&fl=title
Searching     http://localhost:8983/solr/select/?     q=computers&fl=title,author&sort=author+desc
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0     &facet.sort=lex
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0&     facet.sort=count
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0&     facet.sort=cou...
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0&     facet.sort=cou...
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0&     facet.sort=cou...
Searching     http://localhost:8983/solr/select?q=computer&wt=json
Searching     http://localhost:8983/solr/select?q=computer&wt=javabin
Indexing
Indexing     Load wikipedia-basic.xml into a text editor or web browser     Load wikipedia-enhanced.xml into a text edit...
Indexing     schema.xml defines field types and fields used in Solr     Equivalent to your database schema in a RDBMS
Indexing     Change these two fields in schema.xml to be of type “string”     and add multiValued=”true” for each.      <fi...
Indexing     Now add this to the <fields> section of solrconfig.xml:     <field name="source" type="string" indexed="true" ...
Indexing     At the bottom of solrconfig.xml add the following:     <copyField source="text" dest="textgen"/>
Indexing     At your command prompt, in the dev8d directory, execute:     java -jar post.jar wikipedia-enhanced.xml
More Advanced Searching     http://localhost:8983/solr/select?q=computers%20AND     %20babbage&facet=true&facet.field=cate...
More Advanced Searching     http://localhost:8983/solr/terms?     terms.fl=text&terms=true&terms.limit=20
More Advanced Searching     http://localhost:8983/solr/terms?     terms.fl=textgen&terms=true&terms.limit=20
More Advanced Searching     http://localhost:8983/solr/terms?     terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at
thank youupayavira@sourcesense.com
Solr Host Configuration       shard 1       shard 2   searches       shard 3
Solr Host Configuration        shard 1        shard 2        shard 3      co-ordinator
Solr Host Configuration        shard 1        shard 2        shard 3      co-ordinator                     load balancer
Solr Host Configuration        shard 1                      shard 1        shard 2                      shard 2        shar...
Solr Host Configuration        shard 1                      shard 1        shard 2                      shard 2        shar...
Upcoming SlideShare
Loading in...5
×

Small wins in a small time with Apache Solr

2,300

Published on

Slides used in a 2-hour long hands-on tutorial on Apache Solr at Dev8D UK: http://wiki.2011.dev8d.org/w/Session-WK16

"This is an introductory tutorial on Apache Solr, an open source enterprise search engine with a restful web interface."

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,300
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Small wins in a small time with Apache Solr

  1. 1. Small wins In a smalltime with Apache Solr
  2. 2. Who am I? My (Buddhist) name is Upayavira Consultant with Sourcesense, specialising in search and operational technologies A member of the Apache Software Foundation
  3. 3. Who are Sourcesense? Open Source integrator, specialising in:  Search  Business Intelligence  Content Management  Application Lifecycle Management Offices in London, Amsterdam, Milan and Rome
  4. 4. Committers and Contributors Search:  Lucene/Solr – contributor  Hibernate Search – committer  Lucene Infinispan integration – lead developer  Apache UIMA – committer CMS:  Apache Chemistry – contributor  Apache Jackrabbit – contributor  JBoss GateIn Portal – committer  OpenSSO-Alfresco - contributor
  5. 5. What is Lucene? Lucene is a Java information retrieval library Provides free text search facilities Started in 2000, by Doug Cutting A project of the Apache Software Foundation It is designed to be embedded in Java apps
  6. 6. What is Solr? Solr is an enterprise search server based on Lucene Wraps Lucene with a RESTful web interface Provides configurable schema Provides replication functionality
  7. 7. Solr Design User queries Solr SearchHandler instance Lucene index UpdateRequestHandler content application
  8. 8. Prerequisites Java, preferably Java 6 Apache Solr 1.4.1 http://www.sourcesense.com/dev8d-solr.zip
  9. 9. Prerequisites Extract your Solr distribution At a command prompt: – cd into the unzipped distribution directory – cd into the example directory – Enter: java -jar start.jar Visit http://localhost:8983/solr/ in a browser. If you see a welcome message, your Solr works Unpack your dev8d-solr.zip file At another command prompt, cd into your dev8d-solr directory
  10. 10. Checking Solr Works Visit http://localhost:8983/solr/admin/ You should see the Solr admin page. Click statistics link Youll see NumDocs: 0 Theres nothing in the index, so searches wont show much So we need to index some sample content
  11. 11. Indexing Sample Content In your dev8d-solr directory (extracted from the zip), at a command prompt: Java -jar post.jar wikipedia-basic.xml
  12. 12. Searching http://localhost:8983/solr/select?q=*:*
  13. 13. Searching http://localhost:8983/solr/select?q=computers
  14. 14. Searching http://localhost:8983/solr/select?q=computer systems
  15. 15. Searching http://localhost:8983/solr/select?q=computers OR systems
  16. 16. Searching http://localhost:8983/solr/select?q=computers AND systems
  17. 17. Searching http://localhost:8983/solr/select?q="computer systems"
  18. 18. Searching http://localhost:8983/solr/select?q="computer systems"~10
  19. 19. Searching http://localhost:8983/solr/select?q=computers NOT data
  20. 20. Searching http://localhost:8983/solr/select?q=computers -data
  21. 21. Searching http://localhost:8983/solr/select/?q=computers&fl=title
  22. 22. Searching http://localhost:8983/solr/select/?q=computers&fq=author:yobot
  23. 23. Searching http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author
  24. 24. Searching http://localhost:8983/solr/select/? q=computers&rows=10&start=10&fl=title
  25. 25. Searching http://localhost:8983/solr/select/?q=title:system&fl=title
  26. 26. Searching http://localhost:8983/solr/select/? q=computers&fl=title,author&sort=author+desc
  27. 27. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author
  28. 28. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0 &facet.sort=lex
  29. 29. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count
  30. 30. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.mincount=2
  31. 31. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.limit=3
  32. 32. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.limit=3&debugQuery=true
  33. 33. Searching http://localhost:8983/solr/select?q=computer&wt=json
  34. 34. Searching http://localhost:8983/solr/select?q=computer&wt=javabin
  35. 35. Indexing
  36. 36. Indexing Load wikipedia-basic.xml into a text editor or web browser Load wikipedia-enhanced.xml into a text editor or browser Load example/solr/conf/schema.xml into a text editor
  37. 37. Indexing schema.xml defines field types and fields used in Solr Equivalent to your database schema in a RDBMS
  38. 38. Indexing Change these two fields in schema.xml to be of type “string” and add multiValued=”true” for each. <field name="links" type="string" indexed="true" stored="true" multiValued="true"/> <field name="category" type="string" indexed="true" stored="true" multiValued="true"/>
  39. 39. Indexing Now add this to the <fields> section of solrconfig.xml: <field name="source" type="string" indexed="true" stored="true" multiValued="false"/> <field name="textgen" type="textgen" indexed="true" stored="true" multiValued="true"/> Now search for the “textgen” field type definition, further up in the file.
  40. 40. Indexing At the bottom of solrconfig.xml add the following: <copyField source="text" dest="textgen"/>
  41. 41. Indexing At your command prompt, in the dev8d directory, execute: java -jar post.jar wikipedia-enhanced.xml
  42. 42. More Advanced Searching http://localhost:8983/solr/select?q=computers%20AND %20babbage&facet=true&facet.field=category&facet.mincount= 1
  43. 43. More Advanced Searching http://localhost:8983/solr/terms? terms.fl=text&terms=true&terms.limit=20
  44. 44. More Advanced Searching http://localhost:8983/solr/terms? terms.fl=textgen&terms=true&terms.limit=20
  45. 45. More Advanced Searching http://localhost:8983/solr/terms? terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at
  46. 46. thank youupayavira@sourcesense.com
  47. 47. Solr Host Configuration shard 1 shard 2 searches shard 3
  48. 48. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator
  49. 49. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer
  50. 50. Solr Host Configuration shard 1 shard 1 shard 2 shard 2 shard 3 shard 3 co-ordinator co-ordinator load balancer
  51. 51. Solr Host Configuration shard 1 shard 1 shard 2 shard 2 shard 3 shard 3 co-ordinator co-ordinator load balancer
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×