Solr

What is it?
• Text search index (engine)
• Open source
• Not a search product
• A tool that allows you to create a search
solution

What is it like?
• Google, Google Appliance.
• FAST
• Oracle Secure Enterprise Search
• etc.

Google Appliance:
• Sucks data in
• Can’t really configure
• Stuck with results
• Bonnet is locked

Solr:
• You need to feed data in
• Highly configurable
• Search results can be tuned
• There is no bonnet

Why am I doing a talk?
• Did a course
• LucidWorks content
• Presented by FindWise
• FindWise are a search specialist that use a
range of search engines

Caveats
• Course was in Solr 4.1.0, we use 3.6.1 for
APVMA
• Course focussed on search, not ingestion or
presentation
• Java API recommended for ingestion
• ‘Browse’ interface uses Velocity templates for
presentation, but probably isn’t good enough
for most projects.

Apache Tika
• Data import handler
• Used to be part of Lucene
• XML
• PDF
• Word
• Excel
• etc.

Manifold CF
• Apache
• Connector framework
• Used to connect to content repositories (source)
• Sharepoint
• Documentum
• CMIS
• JDBC
• RSS

Hydra
• FindWise
• Although Solr supports validation (e.g.
‘required’), don’t use it for data cleanup.
• Validation failure inconvenient: whole job fails
• Feed in clean data.
• Use Hydra for cleanup.

Apache ZooKeeper
• Used for SolrCloud
• Clustering and sharding
• Solr 4.1.0 only
• Side project for Hadoop
• Used to manage Hadoop clusters

General Approach
• Design schema
• Prototyping
• Integration

Design Schema
• A data modelling exercise
• schema.xml
• Dynamic fields can be useful in the first pass:
<dynamicField name=“*" type="string"
indexed="true" />

Prototyping
• Get the data in (index)
• csv, XML, JSON
• post.jar
• URL to search and inspect raw results
• ‘browse’ interface allows developer to
understand how the search is working
• solrconfig.xml

Integration
• Not covered
• Content ingestion
• Presentation of results
• Up to you…

Solr

More Related Content

What's hot

Viewers also liked

Similar to Solr

Recently uploaded

Solr