• Save
Getting started faster with LucidWorks for Solr
Upcoming SlideShare
Loading in...5
×
 

Getting started faster with LucidWorks for Solr

on

  • 6,303 views

* Open source search with Solr/Lucene gives you the power to turn a wide range of information into fast, useful, relevant results! ...

* Open source search with Solr/Lucene gives you the power to turn a wide range of information into fast, useful, relevant results!
* LucidWorks for Solr gives you a tested, release-stable certified distribution of open source search with enhanced tools and installation for building search apps quickly and reliably.
http://www.lucidimagination.com/How-We-Can-Help/webinar-from-search-to-found

Statistics

Views

Total Views
6,303
Views on SlideShare
6,303
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Getting started faster with LucidWorks for Solr Getting started faster with LucidWorks for Solr Presentation Transcript

  • From Search to Found Grant Ingersoll ‐ Eran Yaniv Thursday, August 6, 2009
  • Agenda Introductions Apache Solr background LucidWorks for Solr Installing LucidWorks for Solr Searching your domain with Solr Putting Solr into production Questions Lucid Imagination, Inc.
  • Introductions Grant Ingersoll Lucene/Solr committer Co‐founder Apache Mahout project Co‐author of upcoming “Taming Text” Eran Yaniv Lucid Solutions Manager Background • Product management • Enterprise Development/IT • Information Retrieval Lucid Imagination, Inc.
  • Apache Solr Background Lucene‐based Search server plus many enterprise tools REST‐like API Faceting Distributed/Replication Easy configuration Many other features:  http://lucene.apache.org/solr/features.html Created at CNET by Yonik Seeley (Lucid co‐founder) Donated to the Apache Software Foundation in 2006 Solr 1.4 release coming soon Lucid Imagination, Inc.
  • Solr Basics Content is modeled via Documents and Fields Content can be text, integers, floats, dates, custom Analysis can be employed to alter content before indexing Controlled via schema.xml Searches are supported through a wide range of Query  options Keyword Terms Phrases Wildcards, other Many clients available: HTTP, Java, Ruby, PHP, .NET, etc. Lucid Imagination, Inc.
  • Solr Basics Schema Define Field Types, Fields, field metadata and Analysis <field name="name" type="text" indexed="true"  stored="true"/> Copy Fields, Dynamic Fields, Similarity overrides Solr Config Define low‐level Lucene controls Specify how clients interact with Solr via Request Handlers (“mini  servlets”) Configure highlighting, spell checking, admin, etc.  Lucid Imagination, Inc.
  • LucidWorks for Solr Based on Apache Solr 1.3 plus Installer for Linux and Windows Specific patches from Solr  • faceting improvements, other 30‐day free “Get Started” program Bundled: • JRE • Apache Tomcat • Optimized KStemmer implementation • Luke • Lucid Gaze for Solr Lucid Imagination, Inc.
  • Getting Started 1. Install Lucid Works 2. Model your domain 3. Index your content 4. Test 5. Deploy Lucid Imagination, Inc.
  • Install Lucid Works Free certified distribution Introduced to many new users New users frequently use “Get Started” Over 50% of the cases: “How to install” Installer Simple Plugins and enhancements Updateable Support for Linux, Windows (Mac?) UI and headless Lucid Imagination, Inc.
  • Installer Overview Solr installer service Hosted on lucidimagination.com Public repository Manages repositories Solr installer client Install/Uninstall certified v. Beta Check/install updates Password protected install/update components Upgrade to platform Early adapters Dev ‐ Internal
  • Starting Lucid Works cd <INSTALL_PATH>/lucidworks ./lucidworks.sh start (*NIX)  .lucidworks.bat start (Windows) Point your browser at http://localhost:8983/solr/ Lucid Imagination, Inc.
  • Master Your Domain with Solr Get to know your content Get to know your users Model in Solr Lucid Imagination, Inc.
  • Modeling your Content Collection/Aggregate Examine collection level stats, like: • MIME Types • Number of Docs • Update rates • Languages present • Much, much more Look for patterns and relationships Identify helpful resources Lucid Imagination, Inc.
  • Modeling your Content Randomly sample a set of your documents Look for: Common structures like titles, tables, columns, etc. Important metadata Tokenization issues • Try out in http://localhost:8983/solr/admin/analysis.jsp Importance Indicators May also look at paragraph, sentence, word and character issues Often useful to run docs through indexing process in an  iterative process Lucid Imagination, Inc.
  • Understanding your Users UI Expectations Speed and Relevance Search and Discovery Search Faceting Did you mean? Similar Pages (More Like This) Highlighting Document/Results Clustering
  • Build your Application Map your content into Documents and Fields via the Solr schema Setup your Solr access patterns in the solrconfig.xml Index your content  Search Lucid Imagination, Inc.
  • Indexing Many Clients Java, PHP, Ruby, etc. See example/exampledocs Pull from DB, others Upload CSV, Solr XML <add><doc> <field  name="id">EN7800GTX/2DHTV/25 6M</field> <field name="manu">ASUS Computer  Inc.</field> <field name="cat">electronics</field> </doc></add>
  • Search Clients also support search  through API calls HTTP support by  definition: http://localhost:8983/sol r/select/?q=*:*&fl=score, id http://localhost:8983/sol r/select/?q=name:iPod&f l=score,id
  • Load Testing Solr scales quite well, but you should still load test to  establish performance specs for your application Apache JMeter can be a good start Ideally, playback old logs at the rate they occurred As with any Java application, keep an eye on JVM factors  like heap size and garbage collection Lucid Imagination, Inc.
  • Improving Performance Search Avoid wildcards, or at least require prefix Catch‐all field for “generic” search Choose proper faceting method for the situation Replicate/Shard Indexing Minimal analysis to achieve results (speeds indexing) Multi‐threaded, batch submission Usual Suspects:  CPU, Memory, Disk, JVM http://www.lucidimagination.com/Community/Hear‐from‐ the‐Experts/Articles/Scaling‐Lucene‐and‐Solr/ Lucid Imagination, Inc.
  • Relevance Testing Often overlooked until there is a problem; instead plan for it  upfront Types: Ad hoc Log based/ QA driven Standard Collections and Queries (TREC) Best Practice:  Take top 50 or so queries by volume, plus ~20  random queries and rate the top ten results as relevant,  somewhat relevant, not relevant, embarrassing Lucid Imagination, Inc.
  • Troubleshooting Relevance in LucidWorks for Solr Add an &debugQuery=true to any Query: Provides info on why doc scored the way it did, plus  other info about the Query http://localhost:8983/solr/select/?q=*:*&de bugQuery=true Solr’s built in  LukeRequestHandler Luke, the Lucene  index  browser lucidworks/luke.(sh|bat)
  • Improving your Search Common Techniques Analysis: Lowercase, stemming,  synonyms, stopwords,  compound analysis (e.g. STR‐ AV220 ‐> STR AV 220) Boosts (query and index) Faceting and other  navigational aids Spell Checking
  • Improving your Queries Disjunction Max Query (more in a minute) Better stop word handling Phrase Queries and other Position‐based Queries “quick red fox”~3 Recency/Freshness Invisible Queries Relevance Feedback and “More Like This” Fake Queries Lucid Imagination, Inc.
  • Disjunction Max Query Useful when searching across multiple fields Example (thanks to Chuck Williams) •Query: t:elephant d:elephant t:albino d:albino •Doc1: •Doc2: •t: elephant •t: elephant •d: elephant •d: albino • Each Doc scores the same for BooleanQuery • DisjunctionMaxQuery scores Doc2 higher Lucid Imagination, Inc.
  • Advanced Techniques Payloads http://www.lucidimagination.com/blog/2009/08/05/getting‐ started‐with‐payloads/ DelimitedPayloadTokenFilter (better name?) • Add payloads inline:  foo|2.3 bar|5.4 BoostingFunctionTermQuery (Lucene 2.9, Solr 1.4) Natural Language Processing Named Entity Extraction (OpenNLP, Stanford NER, Commercial) Sentiment Analysis Event Detection Relationship Identification Lucid Imagination, Inc.
  • Solr in Production Hardware Monitoring Lucid Gaze for Solr Nagios, Hyperic, Port monitoring Troubleshooting Solr Community – ad hoc support Lucid Support – Commercial support with SLAs Growth Query Volume Index Size Lucid Imagination, Inc.
  • Lucid Gaze for Solr Monitor Solr Request Handlers Comes with LucidWorks for Solr http://localhost:8983/gaze Lucid Imagination, Inc.
  • Lucid Imagination, Inc.
  • Resources Websites http://www.lucidimagination.com http://search.lucidimagination.com http://lucene.apache.org/solr Solr Support and Training http://www.lucidimagination.com/How‐We‐Can‐Help SLAs, Public, Private and Online Training for Solr and Lucene Mailing Lists solr‐user@lucene.apache.org Lucid Imagination, Inc.