Enterprise Search in
  Plone using Solr
      Calvin Hendryx-Parker
    Plone Symposium East 2010




            nowhere ...
What is Solr?
•   Java Based
•   Full-Text Search
•   Web Services API
•   Standards Based Interfaces
•   Scalable
•   XML...
Playing with Solr


•   Indexing
•   Query




                    sixfeetup.com/deploy
nowhere to go but
open source
s i xf e e tu p . c om
nowhere to go but
open source
s i xf e e tu p . c om
Solr Features
•   Data Schema
•   Faceted Search
•   Administrative Interface
•   Incremental Updates
•   Supports Shardin...
Solr Features
•   Stopwords
•   Synonyms
•   Highlighted Context Snippets
•   Spelling Suggestions
•   More Like This Sugg...
nowhere to go but
open source
s i xf e e tu p . c om
nowhere to go but
open source
s i xf e e tu p . c om
nowhere to go but
open source
s i xf e e tu p . c om
Solr Performance
•   Wiktionary Dataset
•   49.5 Millions lines of XML
•   1.3 GB of data
•   1.7 Million Pages Index 5.5 ...
Integration Options with Plone



•   collective.solr




                            sixfeetup.com/deploy
collective.solr Issues
•   Monkey Patching
•   Relies on collective.indexing
•   Duplicates all indexes
•   Sub-Optimal In...
What to do?


    nowhere to go but
   open source
   s i xf e e tu p . c om
Reevaluate


   nowhere to go but
  open source
  s i xf e e tu p . c om
Solr Integration as a Catalog Index



•   No Monkey Patching
•   Simpler Code




                                sixfeet...
Enter alm.solrindex
•   ZCatalog Index
•   Doesn't depend on Plone
•   Utilizes new foreign_connections Connection
    Met...
nowhere to go but
open source
s i xf e e tu p . c om
nowhere to go but
open source
s i xf e e tu p . c om
Sorting


•   Still handled by the ZCatalog
•   Could change in the future




                                    sixfeet...
alm.solrindex Field Handlers


•   Handle Parsing Attributes for Indexing
•   Translate field-specific queries to Solr
•   Z...
Example Handler
 class TextFieldHandler(DefaultFieldHandler):
        <html>
   def parse_query(self, field, field_query):...
Other alm.solrindex Features


•   GenericSetup Profile
•   Tests
•   Uses solrpy instead of the unsupported solr.py




  ...
Tips
•   Can replace several ZCatalog indexes
•   Remove any indexes you have replaced
•   Use it for all Text Indexes
•  ...
Demo
  <html>
  <body>
  <h3>Code Sample</h3>
  <p>Replace this text!</p>
  </body>    Project Gutenberg   Data
  </html>
...
Questions?


   nowhere to go but
  open source
  s i xf e e tu p . c om
sixfeetup.com/deploy
                       sixfeetup.com/deploy
Upcoming SlideShare
Loading in …5
×

Enterprise search in Plone using Solr

3,588 views

Published on

Out of the box, Plone includes an integrated and powerful search engine with features such as live search and full text indexing. Sometimes this isn't enough or you need more robust search features to provide your site visitors with a more custom search experience.

In this talk, Six Feet Up CTO Calvin Hendryx-Parker, will go into the details of implementing Solr with Plone for a large project. Solr is an enterprise search engine that can be deployed alongside of Plone.

Some of the topics to be discussed include:
weighted search
thesaurus
spell check
flexible query parsing
faster search performance
and more...

Published in: Technology, Education
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,588
On SlideShare
0
From Embeds
0
Number of Embeds
354
Actions
Shares
0
Downloads
46
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

  • enterprise search server

    XML, JSON, HTTP (REST)

    Efficient Replication to other Solr Search Servers

    Full Plugin Architecture
  • XML over HTTP

    HTTP GET returns XML results


  • very similar to the ZCatalog

    Numeric Types, Dynamic Fields, Unique Keys

    rich set of debugging tools
  • very similar to the ZCatalog

    Numeric Types, Dynamic Fields, Unique Keys

    rich set of debugging tools



  • 17GB before packing
    still using zope in the middle for this
    did the same import with ZCTextIndex and ZCatalog and it crashed and burned at 900k items

  • designed to work with this package, which is a collection of more monkey patches

    except SearchableText which it removes from the catalog
    anytime you add an index to the catalog, have to add it to solr also

    global or thread local variables usually make it much harder to understand
    horrible code readability
    could lead to unexpected results, maybe your solr connection isn&apos;t there anymore

  • This is open source software so we can learn from past experiences of others.
    Six Feet Up embarked on a large project where flexible search features were a great match for Solr.
    We shouldn&apos;t throw away everything in collective.solr.
  • less indirection
    allow us to re-factor our connection to Solr
  • only one needed per catalog
    uses the solr schema to determine what columns to index


    maintains a persistent connection to external db from a ZODB object
    won&apos;t be dropped by object deactivations like _v_ methods
    won&apos;t break when you pass control between treads like thread locals

    Pass solr query paramaters directly via the solr_param dictionary.
    Requires that some value be passed for &quot;q&quot;.
    Can be used to access features like weighting of terms in a query.

    Pass in a solr_callback function in your query and SolrIndex will call it passing the parsed Solr response object.


  • Not handled by SolrIndex at this time.

    Just use the sort_on parameter like you normally would.
  • write your own and register them via ZCML


  • Avoids indexing the same attribute multiple times.

    ZCatalog falls short on full text indexing when it comes to performance and features

    Native ZCatalog indexes are faster than any network bound service.
    ZCatalog Indexes share transaction aware ZODB caches.
  • 32K Books
    richer metadata than the wiktionary data



  • Enterprise search in Plone using Solr

    1. 1. Enterprise Search in Plone using Solr Calvin Hendryx-Parker Plone Symposium East 2010 nowhere to go but open source s i xf e e tu p . c om
    2. 2. What is Solr? • Java Based • Full-Text Search • Web Services API • Standards Based Interfaces • Scalable • XML Configuration • Extensible sixfeetup.com/deploy
    3. 3. Playing with Solr • Indexing • Query sixfeetup.com/deploy
    4. 4. nowhere to go but open source s i xf e e tu p . c om
    5. 5. nowhere to go but open source s i xf e e tu p . c om
    6. 6. Solr Features • Data Schema • Faceted Search • Administrative Interface • Incremental Updates • Supports Sharding • Index Databases, Local Files and Web Pages • Supports Multiple Indexes sixfeetup.com/deploy
    7. 7. Solr Features • Stopwords • Synonyms • Highlighted Context Snippets • Spelling Suggestions • More Like This Suggestions • Supports Rich Documents sixfeetup.com/deploy
    8. 8. nowhere to go but open source s i xf e e tu p . c om
    9. 9. nowhere to go but open source s i xf e e tu p . c om
    10. 10. nowhere to go but open source s i xf e e tu p . c om
    11. 11. Solr Performance • Wiktionary Dataset • 49.5 Millions lines of XML • 1.3 GB of data • 1.7 Million Pages Index 5.5 hours • ZODB Size after import 1.1GB sixfeetup.com/deploy
    12. 12. Integration Options with Plone • collective.solr sixfeetup.com/deploy
    13. 13. collective.solr Issues • Monkey Patching • Relies on collective.indexing • Duplicates all indexes • Sub-Optimal Integration with Zope Transactions • Relies on Thread Locals sixfeetup.com/deploy
    14. 14. What to do? nowhere to go but open source s i xf e e tu p . c om
    15. 15. Reevaluate nowhere to go but open source s i xf e e tu p . c om
    16. 16. Solr Integration as a Catalog Index • No Monkey Patching • Simpler Code sixfeetup.com/deploy
    17. 17. Enter alm.solrindex • ZCatalog Index • Doesn't depend on Plone • Utilizes new foreign_connections Connection Method • Pass through Solr Queries • Direct access to the Solr Response sixfeetup.com/deploy
    18. 18. nowhere to go but open source s i xf e e tu p . c om
    19. 19. nowhere to go but open source s i xf e e tu p . c om
    20. 20. Sorting • Still handled by the ZCatalog • Could change in the future sixfeetup.com/deploy
    21. 21. alm.solrindex Field Handlers • Handle Parsing Attributes for Indexing • Translate field-specific queries to Solr • Zope Utilities sixfeetup.com/deploy
    22. 22. Example Handler class TextFieldHandler(DefaultFieldHandler): <html> def parse_query(self, field, field_query): <body> name = field.name <h3>Code Sample</h3> request = {name: field_query} <p>Replace this text!</p> record = parseIndexRequest(request, name, ('query',)) if not record.keys: </body> return None </html> query_str = ' '.join(record.keys) if not query_str: return None return {'q': u'+%s:%s' % (name, quote_query(query_str))} sixfeetup.com/deploy
    23. 23. Other alm.solrindex Features • GenericSetup Profile • Tests • Uses solrpy instead of the unsupported solr.py sixfeetup.com/deploy
    24. 24. Tips • Can replace several ZCatalog indexes • Remove any indexes you have replaced • Use it for all Text Indexes • Still Utilize the ZCatalog Indexes for Everything Else sixfeetup.com/deploy
    25. 25. Demo <html> <body> <h3>Code Sample</h3> <p>Replace this text!</p> </body> Project Gutenberg Data </html> sixfeetup.com/deploy
    26. 26. Questions? nowhere to go but open source s i xf e e tu p . c om
    27. 27. sixfeetup.com/deploy sixfeetup.com/deploy

    ×