• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Open Source Search Applications
 

Open Source Search Applications

on

  • 3,343 views

"Thousands of organizations around the world, including AT&T, Sears, Ford, Verizon, The Guardian, Elsevier, Cisco, Macy’s and more have found their solution: Lucene/Solr open source, the world’s ...

"Thousands of organizations around the world, including AT&T, Sears, Ford, Verizon, The Guardian, Elsevier, Cisco, Macy’s and more have found their solution: Lucene/Solr open source, the world’s most popular search technology. Our new white paper “A Manager’s Guide to Real World Open Source Search Applications” provides numerous case studies across various industries and business models to show how real-world businesses have turned Lucene/Solr open source search into competitive advantage.http://www.lucidimagination.com/files/file/whitepaper/LIWP_LuceneSolrRealWorldSearch.pdf
"

Statistics

Views

Total Views
3,343
Views on SlideShare
3,342
Embed Views
1

Actions

Likes
2
Downloads
1
Comments
0

1 Embed 1

http://bestarticles.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Open Source Search Applications Open Source Search Applications Document Transcript

    •                                                                     The  Case  for  Lucene/Solr:     A  Manager’s  Guide     to  Real  World     Open  Source     Search  Applications           By  Lucid  Imagination    
    •                                                     Abstract   In  today’s  information-­‐driven  environment,  search  is  a  critical  solution  to  problems  when  it  slashes   the  time  and  effort  separating  end  users  from  the  data  they  value.  Search  spans  the  range  of   business  models  and  use  cases—from  driving  direct  customer  sales,  to  analytics  and  business   intelligence,  employee  productivity,  and  reduced  administrative  overhead.  Making  the  best  use  of   search  requires  two  perspectives:  both  a  look  at  the  business  requirements  for  a  search  application   and  a  view  to  new  business  opportunities  created  by  using  search  to  leverage  the  organization’s   content  resources.       Thousands  of  organizations  across  different  sectors  and  business  models  have  harnessed  Apache   Lucene/Solr  to  search  their  rapidly  growing  and  diversifying  content  resources.  Underlying  this   broad  adoption  is  the  extraordinary  power,  scalability,  and  versatility  of  open  source  search   technologies.       This  paper  provides  an  overview  of  both  the  requirements  and  the  opportunities  for  search   applications.  It  then  explores  how  real  world  organizations  are  successfully  using  Lucene/Solr   search  applications  to  meet  those  opportunities,  presenting  how  the  technology  is  used  for  specific   business  models  and  use  cases  across  industries.  In  addition,  it  offers  a  baseline  for  setting  search   requirements  that  managers  and  architects  can  use  to  adopt  Lucene/Solr,  and  adapt  this  open   source  search  technology  to  the  unique  needs  of  their  business.                       ©  2010,  Lucid  Imagination   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page ii
    •                                                     Table  of  Contents   Introduction ............................................................................................................................................................... 1   Understanding  Search  Opportunities  and  Requirements ...................................................................... 2   What  Data  and  Documents  Are  You  Searching? ................................................................................ 3   Who  Needs  the  Results  and  Why? ........................................................................................................... 3   Where  Is  Search  Integrated  with  IT  Infrastructure? ....................................................................... 5   How  Is  the  Search  Interface  Presented  to  the  User?........................................................................ 5   The  Real  World:  Applications  and  Case  Studies ......................................................................................... 7   Yellow  Pages,  Local  Search,  and  Searching  Classifieds........................................................................ 8   Media .......................................................................................................................................................................10   E-­‐commerce..........................................................................................................................................................12   Job  and  Career  Sites ..........................................................................................................................................14   Libraries,  Archives,  and  Museums  (LAMs)  Search ..............................................................................16   Social  Media  Search...........................................................................................................................................18   Enterprise  (Intranet)  Search.........................................................................................................................21   Business  Use  Case  Matrix ...................................................................................................................................23   Appendix:  Lucene/Solr  Features  and  Benefits..........................................................................................24     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page iii
    •                                                   Introduction As  fast  as  companies,  communities,  and  consumers  produce  data—about  each  other,  products,   opinions,  research,  and  everything  else  imaginable—they  need  faster,  more  versatile  search   capabilities  to  find  the  information  they  need  to  create  opportunities  for  competitive  advantage.  In   today’s  information-­‐driven  environment,  search  addresses  the  critical  problems  created  by  the   explosive  growth  of  content  by  slashing  the  time  and  effort  users  expend  in  finding  data  they  value.   Search  spans  the  range  of  business  models  and  use  cases:  from  driving  direct  customer  sales,  to   analytics  and  business  intelligence,  employee  productivity,  and  reduced  administrative  overhead.     Apache  Lucene/Solr1  open  source  search  technology  has  been  implemented  across  the  broadest   range  of  applications  and  business  models—and  likely  in  ways  that  can  fit  the  needs  of  your   organization.  In  successful  operation  today  at  thousands  of  enterprises,  Lucene/Solr  technology   scales  from  tens  of  thousands  to  hundreds  and  billions  of  documents;  searches  data  that  is   structured,  unstructured,  and  in  combination;  data  inside  and  outside  the  firewall;  and  ranges  in   use  from  a  simple  website  search  box  through  sophisticated  faceted  navigation.  It  addresses  equally   diverse  business  processes  and  mission  critical  applications.  Across  the  spectrum,  Lucene/Solr   helps  users  find,  make  sense  of,  and  act  upon  information  quickly  and  efficiently.   In  this  white  paper,  we’ll  review  real-­‐world  case  studies  for  Lucene/Solr  functionality  across   business  sectors  to  demonstrate  its  versatility  and  varied  applicability.  The  diversity  of  examples   provides  strong  evidence  of  Lucene/Solr’s  flexibility  and  power  as  a  search  technology.  The   examples  also  attest  to  the  innovation  and  transparency  inherent  to  the  open  source  development   model.  Our  focus  is  on  familiarizing  the  audience  of  business  managers  and  application  owners  with   existing  Lucene/Solr  applications;  the  substantial  technical  advantages  to  developers  are  covered   elsewhere.                                                                                                                     1 Lucene and Solr are complementary technologies that offer very similar underlying capabilities; Solr is the Lucene Search Server. Since Lucene serves as the core of Solr’s search capabilities, this paper refers to the two as Lucene/Solr. For more information, see the Appendix. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 1
    •                                                   We’ll  first  survey  the  key  requirements  and  business  use  cases  of  search  and  then  look  at  where   they  are  built  into  search  applications.  Our  objective  is  to  provide  business  managers  and   application  owners  with  a  broad  perspective  on  how  Lucene/Solr  search  technology  is  used  to  build   solutions  to  compelling  business  problems.  In  the  Appendix,  we  provide  an  overview  of   Lucene/Solr’s  key  features  and  benefits,  with  a  basic  outline  of  the  capabilities  offered  to  meet  the   broadest  range  of  business  needs.     Understanding Search Opportunities and Requirements Search  technology  has  come  a  long  way  from  its  roots  in  matching  keywords  with  appearance  in   documents  and  obtaining  undifferentiated  results.  Search  today  empowers  users  by  delivering   actionable  information  quickly  and  efficiently,  across  multiple,  diverse  sources  of  data.  The   business  use  cases  range  from  executing  mission  critical  commercial  transactions  (e.g.,  e-­‐commerce   sites)  to  unlocking  employee  and  end-­‐user  productivity  in  the  search  for  a  single  relevant  document   (e.g.,  enterprise  search).     Given  the  breadth  of  capability  of  the  problem  domain,  it’s  useful  to  look  at  search  and  ask  two   fundamental  questions:  “How  it  can  it  solve  my  business  problems?”  and  “What  new  business   opportunities  can  search  solve  for?”   In  considering  how  search  technology  solves  business  problems,  it  is  useful  to  start  with  an   elucidation  of  the  requirements  you’ll  need  to  consider  for  your  search  application.  At  the  same   time,  be  sure  to  look  more  broadly  at  the  capabilities  that  Lucene/Solr  offers,  as  it  can  help  open  up   new  frontiers  for  incorporating  search  and  leveraging  more  value  from  data  repositories.     Starting  with  some  basic  questions—what,  who,  how,  and  where—you  can  clarify  the  high-­‐level   business  requirements  specific  to  your  business  needs,  which  in  turn  allow  you  to  make  the  best   decisions  for  your  search  application.  The  process  of  looking  at  the  fundamentals  also  raises  new   questions  about  how  and  where  the  search  technology  offered  by  Lucene  and  Solr  can  create  new   business  opportunities.   Let’s  look  at  four  fundamental  questions  you  should  address  in  understanding  search  opportunities   and  requirements:   • What  data  and  documents  are  you  searching?     • Who  needs  the  results  and  why?     • Where  is  search  integrated  with  IT  Infrastructure?         • How  is  the  search  interface  presented  to  the  user?     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 2
    •                                                   What Data and Documents Are You Searching? Business  today  is  driven  more  than  ever  by  the  end-­‐users’  creation  and  consumption  of  real-­‐time   information.  A  key  differentiating  capability  of  search  technology  is  ingesting  a  broad  range  of   content  types  and  processing  large  collections  of  diverse  data  in  real  time  in  order  to  deliver   actionable  information.  Two  aspects  to  consider:   • Types  of  Content   Content  comes  in  multiple  formats:  HTML  pages,  XML  files,  PDFs,  images,  PowerPoint   presentations,  Excel  spreadsheets,  Word  documents,  log  files,  multimedia  content,  and   more.  Content  resides  in  various  repositories,  including  databases,  file  servers,  content   management  systems,  archiving  systems,  collaboration  applications,  and  employee   desktops  and  laptops.  Search  technology  must  be  able  to  locate,  organize,  and  aggregate   data  whatever  its  form  or  location.     • Frequency  of  Updating  Content   Organizations  update  content  at  varying  intervals,  driven  by  differing  business  processes   and  models—social  media  or  news  applications  have  real-­‐time  content  need,  whereas  an  e-­‐ commerce  application  might  re-­‐index  in  response  to  new  inventory  on  a  batch  basis  and  a   research  institution  might  add  to  its  collection  less  often  still.  Search  applications  need  to  be   adaptable  to  the  differences  in  content  change  frequency.   Who Needs the Results and Why? Business  search  puts  a  high  priority  on  end  user  experience  and  results  in  which  the  searched   content  is  tuned  to  the  unique  needs  of  each  user.  Because,  after  all,  the  human  dimension—the   usefulness  of  results  and  the  efficacy  of  interaction—is  the  acid  test  of  a  search  application.  Internet   search  applications  like  Google,  Yahoo,  and  Bing  are  now  common  and  mature.  They  have  raised   user  expectations  about  key  qualities  of  the  search  experience...but  they  solve  a  very  different   problem.     While  Internet  searches  can  produce  millions  of  results  in  milliseconds,  they  rely  on  measures  like   website  popularity  or  URLs  and  domain  names—not  relevant  and  not  generally  applicable  to   purpose-­‐built  applications  for  businesses.  What’s  more,  they  rely  on  generalizing  relevancy  for  a   global  population  of  all  Internet  users,  without  being  tied  to  business  rules,  or  business  process   logic,  or  the  opportunity  cost  of  improved  precision  for  a  specific  set  of  data  or  search  users.   Business  search  applications  cannot  rely  on  such  brute  force  coarse  approaches  to  tune  their   results.  They  need  far  more  control  and  precision.  They  have  to  be  able  to  deliver  highly  useful   results  while  matching,  if  not  exceeding,  the  levels  of  user  experience  that  people  have  come  to   expect  by  virtue  of  their  daily  interactions  with  commercial  search  engines.  Key  points  of   consideration  from  a  business  perspective  are:   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 3
    •                                                   • Relevance   Relevance  is  entirely  a  factor  of  the  goals  of  the  search  application’s  users.  The  application   must  have  the  mechanisms  to  recognize  the  subjective  needs  of  users  and  tune  results   accordingly.  It  must  also  provide  easier  ways  to  narrow  search  criteria  without  requiring   users  to  come  up  with  perfect  query  terms.  Flexibility  for  drilling  deeper  will  make  results   richer  and  valuable.  Mechanisms  to  apply  filters,  proximity  values,  and  sorting  parameters   to  narrow  search  scope  can  also  lead  to  a  richer  set  of  more  useful  results,  with  less  time   and  effort.   • Cost  of  Relevance     As  business  goals  are  driven  by  revenue  opportunities  and  cost  savings,  it  is  critical  to  tie   relevance  to  the  economics  of  the  business.  For  example,  a  public-­‐facing  retail  site  should   focus  on  matching  merchandise  to  search,  site  stickiness,  and  customer  loyalty.  It  requires   search  technology  that  streamlines  and  simplifies  the  shopping  experience  with  relevant   results  directly  contributing  to  sales  revenue.  For  knowledge  workers,  internal  search   applications  should  help  make  employees  more  productive  by  reducing  the  amount  of  time   and  effort  to  find  documents  they  need  to  do  their  jobs.  Multiple  studies  show  that   information  workers  can  spend  20–30%  of  their  time  searching  for  information.   • Precision  Ranking   Result  accuracy,  sorted  by  attributes  like  relevance,  date,  field,  or  any  document  property   feature,  makes  the  search  process  better.  End  users  generally  abandon  a  search  before   tackling  the  fine  points  of  Boolean  logic  or  scrolling  for  a  result  buried  too  far  down.     • Query  Response  Speed   Today,  5–7  seconds  is  the  typical  threshold  for  end-­‐user  patience.  Too  much  wait  time  for   search  results  frustrates  users,  and  causes  them  to  abandon  pages.  Fast,  relevant  results   cannot  be  limited  by  search  technology  hamstrung  by  data  influx  or  query  overload.  Query   response  time  should  also  work  hand-­‐in-­‐hand  with  the  refinement  of  multiple  search   attributes,  so  that  increasingly  complex  queries  do  not  extract  a  performance  penalty.   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 4
    •                                                   Where Is Search Integrated with IT Infrastructure? Useful,  valuable  search  technology  rarely  exists  in  isolation.  Searched  data  is  transformed  into   actionable  information  when  it  is  integrated  with  the  organization’s  information  infrastructure:   business  process  to  business  intelligence  to  content  management  systems.  A  robust  search   technology  must  be  customizable  to  integrate  with  the  existing  systems  seamlessly.     • Application  Integration   A  key  requirement  for  a  search  application  is  its  extensibility  for  integration  with  existing   infrastructure  and  applications  like  content  management  systems,  databases,  and  the  full   range  of  business  processes  and  applications.  It  should  have  interfaces  that  support   ingestion  of  data  as  well  as  delivery  of  results  in  readily  consumable  formats—because  in   many  cases,  results  are  consumed  by  other  applications,  not  a  human.   • Scalability   We  can  assume  that  data  will  change  and  grow.  So  scalability  is  a  key  factor  for  search   application.  Applications  should  grow  to  address  future  needs  without  penalties  for  the   breadth  of  data  or  for  the  count  of  documents  indexed.  The  search  application  should  be   able  to  grow  with  the  requirements  of  the  organization,  without  needing  additional  large   investments  in  hardware  to  match  the  pace  of  growth.  Proprietary  search  vendors  often   charge  for  search  by  the  number  of  documents  indexed.  In  a  world  where  constantly   expanding  content  growth  is  the  norm,  such  costs  can  be  a  real  and  substantial  drag  on   the  cost  of  ownership  for  search  applications,  many  times  resulting  in  negative  return.     • Security   Every  organization  has  its  own  security  requirements  and  access  controls.  Search   technologies  need  to  comply  with  the  security  policies  of  the  enterprise,  controlling   results  that  have  restricted  access.  The  search  technology  should  also  be  able  to  make  use   of  document-­‐level  security  from  other  sources.     How Is the Search Interface Presented to the User? The  user  interface  is  where  search  delivers  on  findability  and  presents  actionable  results.  The   search  application  is  only  as  good  as  the  convenience  of  submitting  queries,  reviewing  and  refining   results,  and  finding  information.  Key  aspects  to  consider:     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 5
    •                                                   • Navigation   Users  benefit  from  guidance  that  makes  their  queries  more  productive.  Techniques  such  as   faceted  search  with  result  clustering,  advance  hinting  (“did  you  mean”),  “more  like  this,”   and  drop  down  menus  for  setting  search  scope  help  users  achieve  desired  results  faster,   making  a  search  application  both  user-­‐  and  information-­‐friendly.  It  is  also  important  to   allow  users  to  draw  associative  connections  between  results—using  the  technology  to   uncover  relationships  and  discover  more  about  what  they  were  seeking  than  they  knew  at   the  outset.     The  NetFlix  search   application  is  powered   by  Solr;  it  adds  the  fuzzy   dimension  to  search,   with  auto-­completion  of   movie  names,  correction   of  misspelled  names  of   actors,  and  suggests   titles  closest  to  the   query.  As  a  result,  85%   of  users  have  found  the   movie  they  were  looking   for  ranked  at  the  #1  spot   in  the  results.         • Discovery   Search  application  functionality  should  extend  beyond  the  generic  presentation  of  a  result   list  of  documents  that  contain  a  keyword.  Highlighting  keywords  in  searched  results,   expanding  searches  with  synonyms  and  spell  checking,  and  offering  users  ways  to  learn  a   bit  more  about  documents  in  the  results  without  having  to  load  the  document  are  great   ways  to  significantly  improve  usability.       • Intuitive  Intelligence   Search  applications  must  go  beyond  keyword  search  to  help  users  retrieve  accurate   information  even  when  they  are  not  sure  of  the  best  keywords.  Additionally,  they  should   reduce  misinterpretations  where  homonyms,  spelling  errors,  and  ambiguous  keywords  are   involved  (e.g.,  is  “apple”  a  fruit  or  a  computer  company?).   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 6
    •                                                   The Real World: Applications and Case Studies With  an  understanding  of  the  fundamentals  of  search  business  applications  in  hand,  it  is   helpful  to  gain  additional  context  on  business  usage  through  a  survey  of  organizations  that   have  successfully  used  Lucene/Solr  for  powerful  search  applications.     All  of  these  cases  were  built  on  the  capability  of  Lucene/Solr  to  provide  innovative,  high-­‐ performance,  cross-­‐platform,  feature-­‐rich  search  technology  suitable  for  nearly  every   application.  By  powering  diverse  search  applications  for  thousands  of  organizations  such   as  AT&T,  Zappos,  McClatchy,  Smithsonian,  MTV  Networks,  LinkedIn,  MySpace,  Comcast,   Monster,  Netflix,  and  many  more,  Lucene/Solr  has  provided  mission  critical  capability  that   turns  search  into  a  robust  competitive  advantage.     For  these  organizations,  Lucene/Solr  solutions  regularly  index  and  search  hundreds  of   millions  of  documents  with  subsecond  response  time,  unencumbered  by  costly  licensing  or   vendor  lock-­‐in.  Together  they  represent  a  compelling  argument  for  the  broad  applicability   of  Lucene/Solr  across  the  full  range  of  business  opportunities  and  search  needs.  Business   use  case  studies  we’ll  review  include:   • Yellow  Pages,  Local  Search,  and  Searching  Classifieds   • Media   • E-­‐commerce     • Job  and  Career  Sites     • Libraries,  Archives,  and  Museums  (LAMs)  Search     • Social  Media  Search     • Enterprise  (Intranet)  Search     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 7
    •                                                   Yellow Pages, Local Search, and Searching Requirements     Classifieds In  the  business  of  online  local  search,  geographic-­‐based  (location)   • Intelligent  results  going   beyond  keyword  search   relevance  generates  competitive  advantage.  Online  directories   need  to  provide  a  rich,  interactive  search  experience  to  users  to   • Deeper,  faceted   increase  site  views  and  stickiness,  which  in  turn  translates  into   navigation   increased  advertising  revenue.  Simplified  location-­‐based  search,   • Seamless  integration   with  latest  Web  2.0   intuitive  faceted  query  response,  and  data  mashups  are  a  few   features  that  define  search  functionality  for  an  online  directory.   tools   • Lower  IT-­‐related  costs   Lucene/Solr  solutions  offer  accurate  search  results,  factoring  in   • Geocentric  user   location,  users’  reviews,  and  ratings,  alongside  paid  advertising.  By   experience   taking  advantage  of  Solr’s  open  source  model—with  search   • Search  numeric  values   algorithms  that  are  completely  transparent—companies  can  invest     in  configuring  their  search  solutions  to  match  their  business  logic,   Solr  Solution   rather  than  trying  to  infer  or  pay  for  exposure  proprietary  back-­‐ end  logic.     • Customizable  Search   Index  which  can  be     tuned  transparently  to     Internet  Yellow  pages  and  local   account  for  key     online  search  is  forecast  to   findability  drivers   • Drop  down  filters  for   grow  to  $27.8  billion  in  2011.     narrowing  or  widening     The  Kelsey  Report1   the  scope  of  search   • Seamless  integration   Success  Stories   with  existing   technologies   • YP.com,  a  division  of  AT&T  Interactive   • Native  numeric   • Zvents.com,  local  event  search  service     encoding  and  search   • Yelp.com,  the  community  local  search  site   capabilities     M • Reduced  server     footprint  for  lower  TCO     than  most  commercial     vendors         1The  Kelsey  Group’s  Global  Print  Yellow  Pages,  Internet  Yellow  Pages  and  Local  Search  Five     Year  Outlook   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 8
    •                                                             Case  Study  1     yp.com  by  AT&T  Interactive       AT&T  Interactive  is  an  online  and  mobile  search  and  advertising  company.  Their  leading-­‐edge  portal,  yp.com—an     online  business  listing  and  advertising  site—was  originally  implemented  with  a  commercial  proprietary  search     application.  It  faced  issues  of  scalability,  vendor  lock-­‐in,  and  performance.  With  help  from  Lucid  Imagination,  AT&T   successfully  migrated  to  a  Solr-­‐based  search  solution  that  leveraged  the  flexibility  of  open  source  without   compromising  features  and  functionality.    And  they  did  so  with  a  much  smaller  budget.     Business  Needs   • Addressing  the  need  to  factor  in  location  to  support  geographic  search,  and  include  relevant  comments   • Striking  a  balance  between  organic  search  and  advertised  content   • Indexing  highly  unstructured  content  such  as  user  comments     • Increasing  relevancy  of  results  and  boosting  paid  search  results  for  preferential  placement  of  advertisers   • Linguistic  support  to  enable  search  experience,  such  as  spellchecking,  synonyms,  find-­‐similar,  etc.   • Integrating  with  latest  Web  2.0  tools   • Reducing  server  footprint     The  Solr  Solution     • Context-­‐specific  relevancy,  geographic  proximity,  ad  placement,  and  user  comments   • Faceting,  drop  down  filters  to  narrow/widen  the  scope  of  search     • Functional  support  for  creating  new  features     • Spell-­‐correction,  and  location-­‐optimized  search  results  to  show  users  businesses  nearest  to  them  first   • Seamless  integration  with  many  Web  2.0  tools  to  create  innovative  features  and  mashups   • Lowers  TCO  by  reducing  the  number  of  search  servers  from  120  to  two  dozen  servers     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 9
    •                                                     Media Brand  reinforcement,  premium  content,  and  easy  accessibility   are  the  main  business  motivators  for  online  media  and   Requirements   publishing  companies.  Relevant  information  improves  time  on   • Real-­‐time  indexing  of   the  site  and  encourages  users  to  explore  related  content,   petabytes  of  structured   boosting  subscription  rates  and  site  views.  These  translate  into  a   and  unstructured  data     virtuous  cycle  of  additional  revenue  generation.   • Deeper  search  capability   • Improved  query   Given  that  content  is  the  business,  the  need  for  a  robust  search   response  time   application  ties  directly  to  competitive  advantage.     • Reduced    infrastructure   Lucene/Solr  provides  a  customized,  function  rich  solution  for  the   and  customization  costs   media  and  publishing  industry.  It  addresses  dynamic  challenges     of  content  diversity,  content  freshness,  and  content  acquisition  ,   Solr  Solution   and  gives  companies  a  platform  on  which    to  build  a  world-­‐class   • Reverse  indexing   innovative  search  experience  to  differentiate  themselves  in  a   • Intelligent,  faceted  search   highly  competitive  marketplace.     to  enable  contextual  and   linguistic  relevance     • Easy  configuration  for     “Solr  has  done  wonders  for  us.   parsing  structured  and     It  is  easy  to  understand  and   unstructured  data   deploy,  and  has  reduced  our   • Easy  and  seamless     installation  for  lower   costs  drastically.”   TCO       Doug  Steigerwald,   • Customization  with  open   source  code      McClatchy  Interactive           Success  Stories   • McClatchy  Newspapers   • Netflix     • Comcast  Interactive   • MTV  Networks,  a  division  of  Viacom   M • The  Motley  Fool,  fool.com     • Fanfeedr.com,  personalized  sports  aggregator     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 10
    •                                                       Case  Study  2     McClatchy—Leading  Newspaper  Publisher   The  third  largest  newspaper  publisher  in  the  United  States,  McClatchy  Company  owns  30  daily   newspapers  in  29  markets  across  the  country.  To  win  online,  McClatchy  knew  it  had  to  have  a  robust   search  solution,  to  empower  the  McClatchy  audience  with  the  information  they  wanted  and  secure   loyalty  from  readers  and  sponsorships  from  advertisers.  Working  with  Lucid  Imagination,  McClatchy   migrated  from  proprietary  search  software  to  open  source  and  chose  Solr  for  its  high  performance,   comprehensive  capabilities,  and  superior  value     Requirements   • Proliferating  content  and  data  sources  (text,  videos,  audios,  images),  with  real-­‐time   streaming     • Empowering  end  users  with  ease  of  use   • Supporting  peak  traffic  and  popular  search  spikes  with  consistent  performance   • Providing  scalability  for  a  database  growing  by  orders  of  magnitude  annually   • Providing  flexibility  to  support  customization   • Controlling  IT  costs  while  exceeding  performance  benchmarks  of  competition     The  Lucene/Solr  Solution     • Deeper  content  by  indexing  both  structured  and  unstructured  data  in  real  time,  effortlessly   • Indexes  millions  of  documents,  with  search  results  delivered  in  milliseconds     • User-­‐friendly  navigation  with  drop  down  filters,  faceted  navigation,  linguistic  corrections,   etc.       • Excellent  performance,  even  in  peak  hours,  by  load-­‐balancing  search  requests  across  servers     • Scalability  without  impact  on  performance     • High  degree  of  customization,  since  it’s  open  source   • Integration  with  existing  IT  infrastructure  and  eliminates  associated  license  fees  to  cut  costs   • 8-­‐fold  reduction  in  server  footprint     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 11
    •                                                   E-commerce     E-­‐commerce  businesses  must  provide  a  compelling  shopping  experience   Requirements   in  order  to  maintain  brand  equity  and  thrive  in  a  very  highly  competitive   • Multidimensional,   market  landscape.  By  reducing  the  time  and  effort  required  to  navigate   dynamic  search   available  merchandise  and  find  what  they  want,  superior  search   • Faster  results   contributes  directly  to  a  satisfying  buying  experience  for  customers.   • Real-­‐time  indexing   Search  then  translates  directly  into  higher  revenues  and  customer   of  products   loyalty.  Instant  results,  intuitively  organized,  advanced  faceting  for  easy   • Faceting  and   browsing,  synchronizing  results  with  images,  and  integration  with  user   browsing   ratings  are  among  the  must  have  features  of  an  e-­‐commerce  search   capabilities   application.   • Seamless   Lucene/Solr  gives  companies  the  ability  to  build  their  sites  around  the   integration  with   concept  of  “searchendizing”—putting  the  desired  merchandise  at  the  top   existing  IT   of  the  results  list—which  can  make  the  difference  between  sales  made   infrastructure   and  sales  lost.  Faceting,  database  integration,  real-­‐time  indexing,  and     query  monitoring  all  enable  users  to  find  products  they  want,  driving   Solr  Solution   conversion  rates  and  enabling  a  winning  online  experience.  2     • Faceted  search  for     deeper  drill  down     Online  retail  sales  in  the   and  browsing     B2C  market  are  expected   • Intuitive  search     capabilities  for   Success  Stories   to  reach  $340  billion  by   cross-­‐channel   201321   shopping   • Buy.com   • Sears.com     experience     Forrester  Research   • System   • Macys.com   administration  tools   • Zappos.com   for  data  loading,   • Advanceautoparts.com   index  replication,   • Dollardays.com   monitoring,  logging,                                                                                                                   and  cache   management     • Query  monitoring   2  “Consumers  will  spend  more  than  $340  billion  online  by  2013,  says  Forrester,”   for  better    Internet  Retailer,  27  November  2009,  http://www.internetretailer.com/dailyNews.asp?id=32630.   highlighting  of   popular  products       The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 12
    •                                                                     Case  Study  3   Zappos   Zappos  is  the  premier  destination  for  online  shoe  shopping.  At  Zappos,  the  mission  is  excellent  online  customer   service—customers  should  be  able  to  browse  shoe  styles,  sizes,  shapes,  and  colors  more  easily  than  any  other  shoe   store,  on  or  offline.  To  achieve  this,  Zappos  wanted  a  robust,  flexible,  multifunctional  search  solution/application.   After  evaluating  many  commercial  search  technologies,  Zappos  zeroed  in  on  Solr,  working  with  Lucid  Imagination  to   ensure  continued,  successful  deployment.   Requirements   • Simplified,  attractive  user  experience  that  makes  it  easy  to  find  and  buy   • Relevant  results,  fast   • Navigation  across  attributes,  such  as  size,  color,  and  style  for  broader  and  deeper  results   • Indexing  products  as  they  were  entered  in  the  catalogs   • Cross-­‐functional  navigation  to  give  customers  a  realistic  shopping  experience   • Intuitive  intelligence  to  provide  alternate  suggestions   • Analytical  capabilities  to  drive  business  strategy   • Facilitating  control  on  results   • Integration  with  existing  IT  infrastructure     The  Solr  Solution   • Search  results  in  subseconds,  across  categories   • Faceting,  for  easy  browsing  and  discovery  and  a  compelling  user  experience     • Real-­‐time  indexing  of  products   • Synchronization  of  visuals,  specs,  filters,  and  promotions  to  make  shopping  experience  true  to  life   • Information  on  user  activity  to  help  build  strategy  on  product  promotions   • Controls  to  rank    popular  or  high-­‐stock  products  in  results    where  users  are  more  likely  to  buy  them   • Facilitates  integration  with  heterogeneous  open  source  environment   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 13
    •                                                       Job and Career Sites Requirements     • Linguistic   Job  portals  are  countercyclical  to  the  economy.  When  the  economy   intelligence  for   flourishes,  posted  jobs  grow  in  number;  when  it  sags,  candidates  flock  in   more  relevant   to  post  their  resumes.  Success  for  an  online  job  portal  is  tied  to  the   results   efficiency  of  its  search  capability—matching  résumés  to  job  listings  and   • Control  search   vice  versa—so  both  employers  and  prospective  employees  can  zero  in   results  to  maintain   on  just  the  right  opportunity.   privacy   For  example,  an  employer  may  want  to  navigate  through  filters  to   • Deeper  search   narrow  the  scope  of  a  candidate  search,  such  as  education,  previous   capability   employer,  salary  history,  skillsets,  etc.;  a  job  seeker  may  want  to  expose   • Numeric  search   these  attributes,  but  keep  a  current  employer’s  name  confidential.  A  job-­‐ • Faster  query   seeker  may  want  to  apply  to  jobs  within  a  particular  geographic  area.   response   • Reduced   Lucene/Solr  not  only  provides  such  flexibility  but  also  addresses  other   infrastructure  and   complexities  of  this  industry  by  enabling  linguistic  intelligence  (such  as   customization  costs   identical  acronyms  that  correspond  to  different  entities;  variations  in     spelling,  imperfectly  constructed  search  queries);  indexing  unstructured   Solr  Solution   data  (résumés);  and  managing  ever-­‐growing  data.   • Intelligent,  faceted     search  to  enable   contextual  and     “I  think  the  breakthrough  was   linguistic  relevance     when  we  tried  it,  and  we   • Easy  configuration   realized,  wow,  this  thing  could   for  parsing     structured  and   really  scale.”   unstructured  data       • Easy  and  seamless     Peter  Keegan,  Monster.com   installation  for     Success  Stories   lower  TCO   • Business  process   • Monster   integration  and   • The  Big  Jobs   Customization  with   • eBharatJobs   open  source  code     • Careerjet       M The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 14  
    •                                                         Monster.com   Monster  is  the  largest  job  search  engine  in  the  world,  with  over  a  million  jobs  posted  at  any  one  time.  By  2008  it  had   150  million  résumés  in  its  database,  serving  over  63  million  job  seekers  per  month,  now  running  on  average  300  to   400  queries  per  second  with  an  average  response  time  of  40  milliseconds.  To  provide  the  highest  level  of  service   and  support  to  their  customers—both  employers  and  job  seekers—Monster  has  an  unmatched  marketplace  for   employment  opportunities,  with  Lucene-­‐based  search  at  the  heart  of  its  business  model.     The  Requirements     • Managing  high  volumes  of  data,  continually  increasing  by  double  digit  percentages  annually   • Maintaining  constant  inventory  updates  and  providing  faster  results   • Removing  technological  barriers  that  limit  the  scope  of  information   • Enabling  end  users  to  refine  search  and  drill  deeper  without  any  performance  impact   • Providing  security  controls  to  ensure  end  user  privacy   • Facilitating  scalability  and  flexibility  in  tandem  with  company’s  vision  and  growth  plans     The  Lucene  Solution     • High  volumes  of  data  by  clustering  data  to  reduce  the  index  size     • Real-­‐time  indexing  for  fresher,  faster  query  results     • Intuitive  search  to  enable  in-­‐depth  cross-­‐functional  job  and  résumé  browsing   • Faceted  search  and  ‘single  click’  filters  for  search  refinement     • Security  controls  to  manage  user  information   • Unlimited  scalability  and  customization  leveraging  open  source  licensing     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 15
    •                                                   Case  Study  4   Libraries, Archives, and Museums (LAMs) Search The  core  asset  of  educational  and  research  institutions  is  knowledge   Requirements     archived  and  accumulated  over  decades.  In  the  world  of  academic  search,   the  diversity  of  information  for  any  query—text,  illustration,  audio/video   • Management  of     media,  or  data  in  any  other  format—makes  unstructured  formats  a  key   multiple  formats  of   aspect  of  the  searchable  archive.     data  and  documents   • Customization  and   Lucene/Solr  gives  academic  and  research  institutions  the  power  to  turn   scalability     information  into  knowledge  by  going  beyond  keyword-­‐driven  search  to   • Linguistic  support  in   expose  a  rich  variety  of  results  and  exploration.  Based  on  the  open  source   queries     model,  it  not  only  integrates  with  the  existing  IT  infrastructure  but  also   • Faster  results   leverages  the  existing  classification  hierarchies  to  give  structure  to     terabytes  of  information  spread  across  disparate  collections,  significantly   reducing  overhead  and  enabling  flexible  and  scalable  deployment.   Solr  Solution     • Optimized  index   infrastructure  limits     “With  Solr,  you  can  do  so  many  things   size  without     without  writing  a  lick  of  code.  I  hadn't   compromising  speed   realized  how  easy  it  is  to  extend  our   or  flexibility     custom  request  handler,  response   • Easy  customization   for  implementing     writer,  and  update  handler.  Just  move   taxonomy  rules     it  all  to  Solr  and  let  it  do  the  heavy   • Faceted  search  to     lifting.”   narrow  results  to  a   specific  source  across     Sjored  Siebinga,  Europeana   diverse  sets  of  data   • Instant  results   Success  Stories   • Seamless  integration   • Smithsonian  Institute     with  IT   • Europeana,  the  European  Union  online  cultural  archive   infrastructure  for   • The  US  Library  of  Congress  and  World  Digital  Library   lower  TCO   • Stanford  University  Library     • University  of  Michigan  Graduate  Library             The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 16
    •                                                      Case  Study  5     Smithsonian     The  Smithsonian  Institution  is  the  flagship  museum  collection  of  the  United  States,  supporting  a  research  institute     that  provides  “one-­‐stop”  searching  for  2  million  records,  including  nearly  a  quarter  of  a  million  media  files  (images,   media  files,  online  journals,  and  other  resources)  distributed  across  dozens  of  archives,  databases,  museums,  and     libraries.  To  make  this  treasure  of  information  easily  accessible  to  people,  the  Smithsonian  needed  an  efficient   search  solution  that  could  overcome  the  following  challenges:   The  Challenges   Managing  a  complicated  taxonomy  that  could  no  longer  accommodate  a  growing  data  index   • Indexing  disparate  types  of  content,  including  documents,  videos,  and  images   • Making  information  available  from  a  large  database   • Providing  access  controls  to  restrict  information     • Integrating  with  existing  legacy  tools     •   Smithsonian  chose  Lucene/Solr,  and  worked  with  Lucid  Imagination  to  create  an  optimized,  well-­‐designed  solution.   The  Solr  Solution   • Efficient  index  strategy  to  manage  a  mix  of  structured  and  unstructured  data   • Holistic  search,  by  optimizing  configuration  to  reduce    the  number  of  servers  and  better  handling  query   requests   • Filtering  information  through  faceted  search     • Access  controls  to  restrict  information  based  on  membership  profiles   • Integration  with  the  existing  IT  infrastructure   • Provides  guidance  and  assistance  on  setting  replicated  search  environment       The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 17
    •                                                   Requirements     Social Media Search • Deliver  search  results   Search  solutions  must  support  differentiated  business  models   matching  Web  2.0  innovations,  including  user-­‐generated  content   as  soon  as  content  is   and  mashups,  without  compromising  scalability—a  challenge,   available   given  the  virtually  limitless  content  on  the  Internet.  Success  and   • Deeper  drill  down   differentiation  is  measured  by  how  well  the  site  provides  relevant   capabilities   results  to  grow  its  user  base  and  keeps  them  engaged.   • Intuitive  interface   Increasingly,  the  technological  factors  driving  Web  2.0  application     paradigms  are  finding  their  way  into  the  enterprise,  unlocking   collaboration  and  productivity  in  new  ways  that  challenge   Lucene/Solr  Solution   conventional  organizational  bounds—and  that  rely  in  equal   measure  on  search  to  create  the  connections  between  employees   • Near-­‐instant  results   to  enable  discovery,  cross-­‐pollination,  and  more  efficient  collective   with  segmentable   effort.   indexing     Lucene/Solr  not  only  provides  fast  results  but  also  facilitates   • Intuitive  search     flexible,  intuitive  navigation  to  help  end  users  connect  with  others.   • Data-­‐driven   It  boosts  the  reach  and  performance  of  search,  while  cutting   spellchecking  based   implementation  costs  and  lowering  barriers  to  innovation.     on  user  search       histories    Linguistic  support   Success  Stories   “With  Solr,  we  really  treat  it   through  ‘Did  you   • Digg   as  kind  of  a  platform  where   Myspace   mean"  functionality     • we  can  build  other  kind  of    Highlighting  keywords   • LinkedIn   • Reddit   things  on  top  of  it…  We  have   • Deeper  drill  down   • Technorati   a  very  valuable  set  of  data,   with  faceting   • Scout  Labs   and  we  really  want  to   • Xmarks.com   • Real-­‐time  content   explore  new  ways  of   updating   building  new  features  from     that  data  set.”   —Sammy  Yu,  Digg.com   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 18
    •                                                       Case  Study  6   Digg.com   Digg  displays  the  wisdom  of  the  crowds.  By  leveraging  the  mass  collaboration  of  readers  distributed  across  the   Internet—everything  on  Digg  is  submitted  by  the  public  community  for  the  public  community—it  builds  on  the  easy   findability  of  information  valued  by  the  marketplace  of  readers  and  consumers.     Digg  realized  early  on  that  to  succeed  in  the  business  of  information,  they  needed  to  make  information  available  to   their  audience  as  effortlessly  as  possible.  They  saw  the  following  challenges  as  roadblocks  for  implementing  a  base   search  application:   Requirements   • Managing  unstructured  data  (13  million  documents  and  growing)  in  real  time   • Providing  results  faster   • Facilitating  smart  navigation  to  provide  information  in  digestible  portions   • Recognizing  and  eliminating  duplicate  content   • Providing  semantic  and  linguistic  smart  application   • Facilitating  scalability  while  retaining  costs       Digg  selected  Solr  for  its  unmatched  flexibility  and  functionality.   The  Solr  Solution   • Highly  customizable  and  flexible   • Results  in  subseconds,  with  simple-­‐to-­‐use  pull  downs  to  refine  results   • Fuzzy  duplicate  detection  (by  coding)   • Unlimited  scalability  and  seamless  integration  with  the  heterogeneous  environment   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 19
    •                                                     Case  Study  7     LinkedIn     Connecting  50  million  registered  users  from  200  countries  across  170  industries  and  matching  them  to     the  right  professional  contacts  is  what  LinkedIn  is  all  about.  LinkedIn’s  business  is  premised  on   intelligent  search  application  that  could  overcome  the  following:       The  Challenges     • Managing  an  ever-­‐growing  database,  with  one  new  member  joining  and  creating  a  profile  every     second   • Indexing  unstructured  data  in  real  time     • Giving  instant  query  responses,  even  in  peak  traffic  hours   • Providing  intuitive  navigation  and  intelligent  linguistic  support     • Integrating  with  other  Web  2.0  tools  to  build  user  profiles  that  integrate  data  from  multiple     sources   They  chose  Lucene  to  implement  the  search  function  at  the  core  of  their  business  model.       The  Lucene  Solution     • Used  index  segmentation  for  faster  results  and  to  limit  index  base   • Provided  faceted  search  and  intelligence  support  features  like  changing  the  view  of  search   results  and  auto-­‐completion  of  contacts     • Calculated  relative  relevance,  ranking  results  on  the  fly  based  on  relationship  between  the  user’s   profile  and  the  other  profiles  being  searched     • Integrated  with  the  latest  web  tools;  for  example,  incorporating  videos  in  search  results   • Provided  "scale  as  you  grow”  facility  through  the  flexibility  of  the  open  source  model     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 20
    •                                                     Enterprise (Intranet) Search Enterprises  today  have  a  global  footprint,  which  leads  to  the  creation  of   Requirements   multiple  content  types  and  the  use  of  disparate  applications  and  content   management  systems  across  business  centers.  The  result  is  often  silos  of   • Single  interface  to   unmanaged  data  spread  across  the  intranet  of  an  enterprise—a  situation   access  enterprise   where  information  is  omnipresent  but  cannot  be  used.   data     • Faster  results     To  achieve  a  competitive  advantage,  enable  intelligent  decisionmaking,   • Control  over  search   eliminate  duplication  of  work,  and  lower  the  cost  of  ownership,   results     enterprises  need  a  search  application  that  gives  structure  to   • Ready  integration   unstructured  data;  provides  a  single  gateway  to  search  across  multiple   with  existing   enterprise  repositories,  with  speed,  flexibility,  and  intuitive  intelligence.     content   Lucene/Solr  is  a  solid  match  for  enterprise  search.  As  a  customizable  and   management   multifunctional  search  application,  Lucene/Solr  provides  robust  search   software   features  at  minimal  cost.  The  open  source  development  model  behind     Lucene/Solr  integrates  seamlessly  with  legacy  tools,  and  brings  down   Solr  Solution   the  total  cost  of  ownership  significantly.     • Single  gateway  for   Given  the  sensitive  nature  of  enterprise  content,  Lucene/Solr  facilitates   all  types  of  data   document-­‐level,  role-­‐based  security.  And  with  the  transparent  search   • Dynamic  boosting   algorithms  and  configurability  for  relevancy,  Lucene/Solr  enables   of  content   intranet  search  with  the  precise  control  enterprise  content  owners   • Transparent  search   require,  ensuring  that  results  consistently  deliver  the  right  documents  to   algorithms  and   the  right  people.   relevancy  tuning   • Customization  and     easy  integration     “The  search  and  discovery   with  open  source     software  market  grew  19   code   percent  in  2008  to  $2.1  billion”       Sue  Feldman,  IDC     M   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 21
    •                                                     Case  Study  8   Food  and  Drug  Administration   The  Food  and  Drug  Administration  (FDA)  is  a  U.S.  government  agency  responsible  for  regulating   and  supervising  the  safety  of  foods  medications,  veterinary  products,  tobacco,  and  cosmetics.  The   FDA  has  a  large  repository  of  information  that  dates  back  multiple  decades,  and  exists  in  formats   ranging  from  early  optical  character  recognition  to  recent  electronic  formats.  To  mine  this   knowledge  base,  the  FDA  is  developing  a  semantic  mining  framework  using  open  source  tools  such   as  Apache  Lucene  and  Solr.   Requirements     • Integrating  petabytes  of  data  highly  distributed  across  the  intranet  of  an  enterprise   • Managing  multiple  indices  for  documents  stored  in  distributed  repositories     • Managing  and  maintaining  archival  data  and  evolving  vocabularies   • Indexing  unstructured  data  in  real  time   • Recognizing  and  eliminating  duplicate  content   • Handling  concurrent  queries  and  delivering  fast  and  relevant  results   • Restricting  search  results  according  to  agency  access  control  policies     • Integrating  with  existing  infrastructure  without  additional  overhead   The  Lucene  Solution   • A  single  gateway  to  search  across  multiple  enterprise  repositories   • Duplicate  detection     • Fast  and  relevant  results  with  content  analysis  and  query  interpretation  algorithms   • Filters  results  based  on  access  controls  and  security  policies  of  an  enterprise     • Facilitates  integration  with  existing  enterprise  infrastructure  to  reduce  TCO       The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 22
    •                                                   Business Use Case Matrix To  simplify  mapping  your  search  needs  to  existing  search  applications  in  the  real  world,  the  matrix   below  compares  business  use  cases  against  key  search  requirements.  While  not  an  exhaustive  list,   the  matrix  highlights  the  different  business  use  cases  across  sectors  and  business  models,  reflecting   the  adaptability  of  Lucene/Solr  across  the  various  domains  of  search  applications  and  use  cases.     Users Content Content Update Frequency Access Verticals Customer Control Internal Original Aggregated High Medium Low Facing Enterprise (Intranet) √ √ √ √ Schools/ √ √ √ √ √ √ Universities Education Libraries √ √ √ √ √ Job Portals √ √ √ √ Social Networks √ √ √ √ √ News √ √ √ √ Media Media √ √ √ √ E-Commerce Sites √ √ √ √ √ √ Financial Services √ √ √ √ √ Yellow Pages √ √ √ Horizontal Portals √ √ √ √   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 23
    •                                                   Appendix: Lucene/Solr Features and Benefits Lucene  and  Solr  are  complementary  technologies  that  offer  very  similar  underlying  capabilities.  In   choosing  a  search  solution  that  is  best  suited  for  your  requirements,  key  factors  to  consider  are   application  scope,  development  environment,  and  software  development  preferences.     Lucene  is  a  Java  technology-­‐based  search  library  that  offers  speed,  relevancy  ranking,  complete   query  capabilities,  portability,  scalability,  and  low  overhead  indexes  and  rapid  incremental   indexing.     Solr  is  the  Lucene  Search  Server.  It  presents  a  web  service  layer  built  atop  Lucene  using  the  Lucene   search  library  and  extending  it  to  provide  application  users  with  a  ready-­‐to-­‐use  search  platform.   Solr  brings  with  it  operational  and  administrative  capabilities  like  web  services,  faceting,   configurable  schema,  caching,  replication,  and  administrative  tools  for  configuration,  data  loading,   statistics,  logging,  cache  management,  and  more.   Lucene  presents  a  collection  of  directly  callable  Java  libraries  and  requires  coding  and  solid   information  retrieval  experience.  Solr  extends  the  capabilities  of  Lucene  to  provide  an  enterprise-­‐ ready  search  platform,  eliminating  the  need  for  extensive  programming.     Solr  provides  the  starting  point  for  most  developers  who  are  building  a  Lucene-­‐based  search   application.  It  comes  ready  to  run  in  a  servlet  container  such  as  Tomcat  or  Jetty,  making  it  ready  to   scale  in  a  production  Java  environment.     With  convenient  ReST-­‐like/web-­‐service  interfaces  callable  over  HTTP,  and  transparent  XML-­‐based   configuration  files,  Solr  can  greatly  accelerate  application  development  and  maintenance.  In  fact,   Lucene  programmers  have  often  reported  that  they  find  Solr  contains  “the  same  features  I  was   going  to  build  myself  as  a  framework  for  Lucene,  but  already  very  well  implemented.”  Using  Solr,   enterprises  can  customize  the  search  application  according  to  their  requirements,  without   involving  the  cost  and  risk  of  writing  the  code  from  the  scratch.   Lucene  provides  greater  control  of  your  source  code  and  works  best  in  development  environments   where  resources  need  to  be  controlled  exclusively  by  Java  API  calls.  It  works  best  when   constructing  and  embedding  a  state-­‐of-­‐the-­‐art  search  engine,  allowing  programmers  to  assemble   and  compile  inside  a  native  Java  application.  While  working  with  Lucene,  programmers  can  directly   control  the  large  set  of  sophisticated  features  with  low-­‐level  access,  data,  or  state  manipulation.     Enterprises  that  do  not  require  strict  control  of  low-­‐level  Java  libraries  generally  prefer  Solr,  as  it   provides  ease  of  use  and  scalable  search  power  out  of  the  box.     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 24
    •                                                     As  functional  siblings,  Lucene  and  Solr  have  become  popular  alternatives  for  search  applications;   the  two  differ  mainly  in  the  style  of  application  development  used.  Key  benefits  of  search  with   Lucene/Solr  include:       • Search  Quality:  Speed,  Relevance,  and  Precision  Lucene/Solr  provides  near-­‐real-­‐time   search  and  strong  relevance  ranking  to  deliver  contextually  relevant  and  accurate  results   very  quickly.  Tailor-­‐made  coding  for  relevancy  ranking  and  sophisticated  search   capabilities  like  faceted  search  help  users  in  sorting,  organizing,  classifying,  and  structuring   retrieved  information  to  ensure  that  search  delivers  desired  results.  Search  with   Lucene/Solr  also  provides  proximity  operators,  wildcards,  fielded  searching,   term/field/document  weights,  find-­‐similar  functions,  spell  checking,  multilingual  search,   and  much  more.     • Lower  Cost  and  Greater  Flexibility,  Plug  and  Play  Architecture  Lucene/Solr  reduces   recurring  and  nonrecurring  costs,  lowering  your  TCO.  As  open  source  software,  it  does  not   require  purchase  of  a  license  and  is  freely  available  for  use.  The  open  source  code  can  be   used  as  is,  modified,  customized,  and  updated  as  appropriate  to  your  needs.  Solr  is  easily   embedded  in  your  enterprise’s  existing  infrastructure,  reducing  costs  of  installation,   configuration,  and  management.     • Open  Source  Platform  for  Portability  and  Easy  Deployment  Because  Lucene/Solr  is  an   open-­‐source  software  solution,  it  is  based  on  open  standards  and  community-­‐driven   development  processes.  It  is  highly  portable  and  can  run  on  any  platform  that  supports  Java.   For  instance,  you  can  build  an  index  on  Linux  and  copy  it  to  a  Microsoft  Windows  machine   and  search  there.  This  unsurpassed  portability  enables  you  to  keep  your  search  application   and  your  company’s  evolving  infrastructure  in  tandem.  Lucene,  in  turn,  has  been   implemented  in  other  environments,  including  C#,  C,  Python,  and  PHP.  At  deployment  time,   Solr  offers  very  flexible  options;  it  can  be  easily  deployed  on  a  single  server  as  well  as  on   distributed,  multiserver  systems.   • Largest  Installed  Base  of  Applications,  Increasing  Customer  Base  Lucene/Solr  is  the   most  widely  used  open  source  search  system  and  is  installed  in  around  4,000  organizations   worldwide.  Publicly  visible  search  sites  that  use  Lucene/Solr  include  CNET,  LinkedIn,   Monster,  Digg,  Zappos,  MySpace,  Netflix,  and  Wikipedia.  Lucene/Solr  is  also  in  use  at  Apple,   HP,  IBM,  Iron  Mountain,  and  Los  Alamos  National  Laboratories.   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 25
    •                                                   • Large  Developer  Base  and  Adaptability  As  community  developed  software,  Lucene/Solr   provides  transparent  development  and  easy  access  to  updates  and  releases.  Developers  can   work  with  open  source  code  and  customize  the  software  according  to  business-­‐specific   needs  and  objectives.  Its  open  source  paradigm  lets  Lucene/Solr  provide  developers  with   the  freedom  and  flexibility  to  evolve  the  software  with  changing  requirements,  liberating   them  from  the  constraints  of  commercial  vendors.     • Commercial-­Grade  Support  for  Mission  Critical  Search  Applications  from  Lucid   Imagination  Lucid  Imagination  provides  the  expertise,  resources,  and  services  that  are   needed  to  help  enterprises  deploy  and  develop  Lucene-­‐based  search  solutions  efficiently   and  cost-­‐effectively.  Lucid  helps  enterprises  achieve  optimal  search  performance  and   accuracy  with  its  broad  range  of  expertise,  which  includes  indexing  and  metadata   management,  content  analysis,  business  rule  application,  and  natural  language  processing.   Lucid  Imagination  also  offers  certified  distributions  of  Lucene  and  Solr,  commercial-­‐grade   SLA-­‐based  support,  training,  high-­‐level  consulting  and  value-­‐added  software  extensions  to   enable  customers  to  create  powerful  and  successful  search  applications.   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 26