Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Small wins In a smalltime with Apache Solr
Who am I?    My (Buddhist) name is Upayavira    Consultant with Sourcesense, specialising in    search and operational t...
Who are Sourcesense?    Open Source integrator, specialising in:            Search            Business Intelligence    ...
Committers and Contributors     Search:                 Lucene/Solr – contributor                 Hibernate Search – co...
What is Lucene?    Lucene is a Java information retrieval library    Provides free text search facilities    Started in...
What is Solr?    Solr is an enterprise search server based on    Lucene    Wraps Lucene with a RESTful web interface   ...
Solr Design                                       User queries     Solr          SearchHandler     instance               ...
Prerequisites    Java, preferably Java 6    Apache Solr 1.4.1    http://www.sourcesense.com/dev8d-solr.zip
Prerequisites    Extract your Solr distribution    At a command prompt:    – cd into the unzipped distribution directory...
Checking Solr Works    Visit http://localhost:8983/solr/admin/    You should see the Solr admin page.    Click statisti...
Indexing Sample Content    In your dev8d-solr directory (extracted from the zip), at    a command prompt:    Java -jar p...
Searching    http://localhost:8983/solr/select?q=*:*
Searching    http://localhost:8983/solr/select?q=computers
Searching    http://localhost:8983/solr/select?q=computer systems
Searching     http://localhost:8983/solr/select?q=computers OR systems
Searching     http://localhost:8983/solr/select?q=computers AND systems
Searching     http://localhost:8983/solr/select?q="computer systems"
Searching     http://localhost:8983/solr/select?q="computer systems"~10
Searching     http://localhost:8983/solr/select?q=computers NOT data
Searching     http://localhost:8983/solr/select?q=computers -data
Searching     http://localhost:8983/solr/select/?q=computers&fl=title
Searching     http://localhost:8983/solr/select/?q=computers&fq=author:yobot
Searching     http://localhost:8983/solr/select/?     q=computers&fq=author:yobot&fl=title,author
Searching     http://localhost:8983/solr/select/?     q=computers&rows=10&start=10&fl=title
Searching     http://localhost:8983/solr/select/?q=title:system&fl=title
Searching     http://localhost:8983/solr/select/?     q=computers&fl=title,author&sort=author+desc
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0     &facet.sort=lex
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0&     facet.sort=count
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0&     facet.sort=cou...
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0&     facet.sort=cou...
Searching     http://localhost:8983/solr/select/?     q=computers&facet=true&facet.field=author&rows=0&     facet.sort=cou...
Searching     http://localhost:8983/solr/select?q=computer&wt=json
Searching     http://localhost:8983/solr/select?q=computer&wt=javabin
Indexing
Indexing     Load wikipedia-basic.xml into a text editor or web browser     Load wikipedia-enhanced.xml into a text edit...
Indexing     schema.xml defines field types and fields used in Solr     Equivalent to your database schema in a RDBMS
Indexing     Change these two fields in schema.xml to be of type “string”     and add multiValued=”true” for each.      <fi...
Indexing     Now add this to the <fields> section of solrconfig.xml:     <field name="source" type="string" indexed="true" ...
Indexing     At the bottom of solrconfig.xml add the following:     <copyField source="text" dest="textgen"/>
Indexing     At your command prompt, in the dev8d directory, execute:     java -jar post.jar wikipedia-enhanced.xml
More Advanced Searching     http://localhost:8983/solr/select?q=computers%20AND     %20babbage&facet=true&facet.field=cate...
More Advanced Searching     http://localhost:8983/solr/terms?     terms.fl=text&terms=true&terms.limit=20
More Advanced Searching     http://localhost:8983/solr/terms?     terms.fl=textgen&terms=true&terms.limit=20
More Advanced Searching     http://localhost:8983/solr/terms?     terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at
thank youupayavira@sourcesense.com
Solr Host Configuration       shard 1       shard 2   searches       shard 3
Solr Host Configuration        shard 1        shard 2        shard 3      co-ordinator
Solr Host Configuration        shard 1        shard 2        shard 3      co-ordinator                     load balancer
Solr Host Configuration        shard 1                      shard 1        shard 2                      shard 2        shar...
Solr Host Configuration        shard 1                      shard 1        shard 2                      shard 2        shar...
Upcoming SlideShare
Loading in …5
×

Small wins in a small time with Apache Solr

2,958 views

Published on

Slides used in a 2-hour long hands-on tutorial on Apache Solr at Dev8D UK: http://wiki.2011.dev8d.org/w/Session-WK16

"This is an introductory tutorial on Apache Solr, an open source enterprise search engine with a restful web interface."

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Small wins in a small time with Apache Solr

  1. 1. Small wins In a smalltime with Apache Solr
  2. 2. Who am I? My (Buddhist) name is Upayavira Consultant with Sourcesense, specialising in search and operational technologies A member of the Apache Software Foundation
  3. 3. Who are Sourcesense? Open Source integrator, specialising in:  Search  Business Intelligence  Content Management  Application Lifecycle Management Offices in London, Amsterdam, Milan and Rome
  4. 4. Committers and Contributors Search:  Lucene/Solr – contributor  Hibernate Search – committer  Lucene Infinispan integration – lead developer  Apache UIMA – committer CMS:  Apache Chemistry – contributor  Apache Jackrabbit – contributor  JBoss GateIn Portal – committer  OpenSSO-Alfresco - contributor
  5. 5. What is Lucene? Lucene is a Java information retrieval library Provides free text search facilities Started in 2000, by Doug Cutting A project of the Apache Software Foundation It is designed to be embedded in Java apps
  6. 6. What is Solr? Solr is an enterprise search server based on Lucene Wraps Lucene with a RESTful web interface Provides configurable schema Provides replication functionality
  7. 7. Solr Design User queries Solr SearchHandler instance Lucene index UpdateRequestHandler content application
  8. 8. Prerequisites Java, preferably Java 6 Apache Solr 1.4.1 http://www.sourcesense.com/dev8d-solr.zip
  9. 9. Prerequisites Extract your Solr distribution At a command prompt: – cd into the unzipped distribution directory – cd into the example directory – Enter: java -jar start.jar Visit http://localhost:8983/solr/ in a browser. If you see a welcome message, your Solr works Unpack your dev8d-solr.zip file At another command prompt, cd into your dev8d-solr directory
  10. 10. Checking Solr Works Visit http://localhost:8983/solr/admin/ You should see the Solr admin page. Click statistics link Youll see NumDocs: 0 Theres nothing in the index, so searches wont show much So we need to index some sample content
  11. 11. Indexing Sample Content In your dev8d-solr directory (extracted from the zip), at a command prompt: Java -jar post.jar wikipedia-basic.xml
  12. 12. Searching http://localhost:8983/solr/select?q=*:*
  13. 13. Searching http://localhost:8983/solr/select?q=computers
  14. 14. Searching http://localhost:8983/solr/select?q=computer systems
  15. 15. Searching http://localhost:8983/solr/select?q=computers OR systems
  16. 16. Searching http://localhost:8983/solr/select?q=computers AND systems
  17. 17. Searching http://localhost:8983/solr/select?q="computer systems"
  18. 18. Searching http://localhost:8983/solr/select?q="computer systems"~10
  19. 19. Searching http://localhost:8983/solr/select?q=computers NOT data
  20. 20. Searching http://localhost:8983/solr/select?q=computers -data
  21. 21. Searching http://localhost:8983/solr/select/?q=computers&fl=title
  22. 22. Searching http://localhost:8983/solr/select/?q=computers&fq=author:yobot
  23. 23. Searching http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author
  24. 24. Searching http://localhost:8983/solr/select/? q=computers&rows=10&start=10&fl=title
  25. 25. Searching http://localhost:8983/solr/select/?q=title:system&fl=title
  26. 26. Searching http://localhost:8983/solr/select/? q=computers&fl=title,author&sort=author+desc
  27. 27. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author
  28. 28. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0 &facet.sort=lex
  29. 29. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count
  30. 30. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.mincount=2
  31. 31. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.limit=3
  32. 32. Searching http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.limit=3&debugQuery=true
  33. 33. Searching http://localhost:8983/solr/select?q=computer&wt=json
  34. 34. Searching http://localhost:8983/solr/select?q=computer&wt=javabin
  35. 35. Indexing
  36. 36. Indexing Load wikipedia-basic.xml into a text editor or web browser Load wikipedia-enhanced.xml into a text editor or browser Load example/solr/conf/schema.xml into a text editor
  37. 37. Indexing schema.xml defines field types and fields used in Solr Equivalent to your database schema in a RDBMS
  38. 38. Indexing Change these two fields in schema.xml to be of type “string” and add multiValued=”true” for each. <field name="links" type="string" indexed="true" stored="true" multiValued="true"/> <field name="category" type="string" indexed="true" stored="true" multiValued="true"/>
  39. 39. Indexing Now add this to the <fields> section of solrconfig.xml: <field name="source" type="string" indexed="true" stored="true" multiValued="false"/> <field name="textgen" type="textgen" indexed="true" stored="true" multiValued="true"/> Now search for the “textgen” field type definition, further up in the file.
  40. 40. Indexing At the bottom of solrconfig.xml add the following: <copyField source="text" dest="textgen"/>
  41. 41. Indexing At your command prompt, in the dev8d directory, execute: java -jar post.jar wikipedia-enhanced.xml
  42. 42. More Advanced Searching http://localhost:8983/solr/select?q=computers%20AND %20babbage&facet=true&facet.field=category&facet.mincount= 1
  43. 43. More Advanced Searching http://localhost:8983/solr/terms? terms.fl=text&terms=true&terms.limit=20
  44. 44. More Advanced Searching http://localhost:8983/solr/terms? terms.fl=textgen&terms=true&terms.limit=20
  45. 45. More Advanced Searching http://localhost:8983/solr/terms? terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at
  46. 46. thank youupayavira@sourcesense.com
  47. 47. Solr Host Configuration shard 1 shard 2 searches shard 3
  48. 48. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator
  49. 49. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer
  50. 50. Solr Host Configuration shard 1 shard 1 shard 2 shard 2 shard 3 shard 3 co-ordinator co-ordinator load balancer
  51. 51. Solr Host Configuration shard 1 shard 1 shard 2 shard 2 shard 3 shard 3 co-ordinator co-ordinator load balancer

×