• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Solr
 

Solr

on

  • 363 views

Solr TechSig 18/4/13 slides.

Solr TechSig 18/4/13 slides.

Statistics

Views

Total Views
363
Views on SlideShare
363
Embed Views
0

Actions

Likes
1
Downloads
14
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Solr Solr Presentation Transcript

    • Solr
    • What is it?• Text search index (engine)• Open source• Not a search product• A tool that allows you to create a search solution
    • What is it like?• Google, Google Appliance.• FAST• Oracle Secure Enterprise Search• etc.
    • Google Appliance:• Sucks data in• Can’t really configure• Stuck with results• Bonnet is locked
    • Solr:• You need to feed data in• Highly configurable• Search results can be tuned• There is no bonnet
    • Why am I doing a talk?• Did a course• LucidWorks content• Presented by FindWise• FindWise are a search specialist that use a range of search engines
    • Caveats• Course was in Solr 4.1.0, we use 3.6.1 for APVMA• Course focussed on search, not ingestion or presentation• Java API recommended for ingestion• ‘Browse’ interface uses Velocity templates for presentation, but probably isn’t good enough for most projects.
    • Where does Solr fit?
    • Application Architecture
    • Apache Tika• Data import handler• Used to be part of Lucene• XML• PDF• Word• Excel• etc.
    • Manifold CF• Apache• Connector framework• Used to connect to content repositories (source)• Sharepoint• Documentum• CMIS• JDBC• RSS
    • Hydra• FindWise• Although Solr supports validation (e.g. ‘required’), don’t use it for data cleanup.• Validation failure inconvenient: whole job fails• Feed in clean data.• Use Hydra for cleanup.
    • Apache ZooKeeper• Used for SolrCloud• Clustering and sharding• Solr 4.1.0 only• Side project for Hadoop• Used to manage Hadoop clusters
    • Inside
    • General Approach• Design schema• Prototyping• Integration
    • Design Schema• A data modelling exercise• schema.xml• Dynamic fields can be useful in the first pass: <dynamicField name=“*" type="string" indexed="true" />
    • Prototyping• Get the data in (index)• csv, XML, JSON• post.jar• URL to search and inspect raw results• ‘browse’ interface allows developer to understand how the search is working• solrconfig.xml
    • Integration• Not covered• Content ingestion• Presentation of results• Up to you…
    • Demo