The	
  Case	
  for	
  Lucene/Solr:	
  	
  
A	
  Manager’s	
  Guide	
  	
  
to	
  Real	
  World	
  	
  
Open	
  Source	
  	
  
Search	
  Applications	
  	
  
	
  
	
  
	
  
By	
  Lucid	
  Imagination	
  	
  
                                                                  	
                                                                   	
  
	
                                                                   	
                                                                   	
  
	
                                                                   	
                                                                   	
  
	
                                                                   	
                                                                   	
  
	
                                                                   	
                                                                   	
  
	
                                                                   	
                                                                   	
  
	
                                                                   	
                                                                   	
  
	
                                                                   	
                                                                   	
               	
  




	
  
Abstract	
  
In	
  today’s	
  information-­‐driven	
  environment,	
  search	
  is	
  a	
  critical	
  solution	
  to	
  problems	
  when	
  it	
  slashes	
  
the	
  time	
  and	
  effort	
  separating	
  end	
  users	
  from	
  the	
  data	
  they	
  value.	
  Search	
  spans	
  the	
  range	
  of	
  
business	
  models	
  and	
  use	
  cases—from	
  driving	
  direct	
  customer	
  sales,	
  to	
  analytics	
  and	
  business	
  
intelligence,	
  employee	
  productivity,	
  and	
  reduced	
  administrative	
  overhead.	
  Making	
  the	
  best	
  use	
  of	
  
search	
  requires	
  two	
  perspectives:	
  both	
  a	
  look	
  at	
  the	
  business	
  requirements	
  for	
  a	
  search	
  application	
  
and	
  a	
  view	
  to	
  new	
  business	
  opportunities	
  created	
  by	
  using	
  search	
  to	
  leverage	
  the	
  organization’s	
  
content	
  resources.	
  	
  
	
  
Thousands	
  of	
  organizations	
  across	
  different	
  sectors	
  and	
  business	
  models	
  have	
  harnessed	
  Apache	
  
Lucene/Solr	
  to	
  search	
  their	
  rapidly	
  growing	
  and	
  diversifying	
  content	
  resources.	
  Underlying	
  this	
  
broad	
  adoption	
  is	
  the	
  extraordinary	
  power,	
  scalability,	
  and	
  versatility	
  of	
  open	
  source	
  search	
  
technologies.	
  	
  
	
  
This	
  paper	
  provides	
  an	
  overview	
  of	
  both	
  the	
  requirements	
  and	
  the	
  opportunities	
  for	
  search	
  
applications.	
  It	
  then	
  explores	
  how	
  real	
  world	
  organizations	
  are	
  successfully	
  using	
  Lucene/Solr	
  
search	
  applications	
  to	
  meet	
  those	
  opportunities,	
  presenting	
  how	
  the	
  technology	
  is	
  used	
  for	
  specific	
  
business	
  models	
  and	
  use	
  cases	
  across	
  industries.	
  In	
  addition,	
  it	
  offers	
  a	
  baseline	
  for	
  setting	
  search	
  
requirements	
  that	
  managers	
  and	
  architects	
  can	
  use	
  to	
  adopt	
  Lucene/Solr,	
  and	
  adapt	
  this	
  open	
  
source	
  search	
  technology	
  to	
  the	
  unique	
  needs	
  of	
  their	
  business.	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
©	
  2010,	
  Lucid	
  Imagination	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                Page ii
                                                                                    	
                                                                                  	
  
	
                                                                                     	
                                                                                  	
  
	
                                                                                     	
                                                                                  	
  
	
                                                                                     	
                                                                                  	
  
	
                                                                                     	
                                                                                  	
  
	
                                                                                     	
                                                                                  	
  
	
                                                                                     	
                                                                                  	
  
	
                                                                                     	
                                                                                  	
            	
  




	
  


Table	
  of	
  Contents	
  
Introduction ............................................................................................................................................................... 1	
  
Understanding	
  Search	
  Opportunities	
  and	
  Requirements ...................................................................... 2	
  
           What	
  Data	
  and	
  Documents	
  Are	
  You	
  Searching? ................................................................................ 3	
  
           Who	
  Needs	
  the	
  Results	
  and	
  Why? ........................................................................................................... 3	
  
           Where	
  Is	
  Search	
  Integrated	
  with	
  IT	
  Infrastructure? ....................................................................... 5	
  
           How	
  Is	
  the	
  Search	
  Interface	
  Presented	
  to	
  the	
  User?........................................................................ 5	
  
The	
  Real	
  World:	
  Applications	
  and	
  Case	
  Studies ......................................................................................... 7	
  
       Yellow	
  Pages,	
  Local	
  Search,	
  and	
  Searching	
  Classifieds........................................................................ 8	
  
       Media .......................................................................................................................................................................10	
  
       E-­‐commerce..........................................................................................................................................................12	
  
       Job	
  and	
  Career	
  Sites ..........................................................................................................................................14	
  
       Libraries,	
  Archives,	
  and	
  Museums	
  (LAMs)	
  Search ..............................................................................16	
  
       Social	
  Media	
  Search...........................................................................................................................................18	
  
       Enterprise	
  (Intranet)	
  Search.........................................................................................................................21	
  
Business	
  Use	
  Case	
  Matrix ...................................................................................................................................23	
  
Appendix:	
  Lucene/Solr	
  Features	
  and	
  Benefits..........................................................................................24	
  
	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                                           Page iii
                                                                                                                                                                                                                                	
     	
  
	
                                                                                                                                                                                                                                 	
     	
  
	
                                                                                                                                                                                                                                 	
     	
  
	
                                                                                                                                                                                                                                 	
     	
  
	
                                                                                                                                                                                                                                 	
     	
  
	
                                                                                                                                                                                                                                 	
     	
  
	
                                                                                                                                                                                                                                 	
     	
  
	
                                                                                                                                                                                                                                 	
     	
              	
  




Introduction
As	
  fast	
  as	
  companies,	
  communities,	
  and	
  consumers	
  produce	
  data—about	
  each	
  other,	
  products,	
  
opinions,	
  research,	
  and	
  everything	
  else	
  imaginable—they	
  need	
  faster,	
  more	
  versatile	
  search	
  
capabilities	
  to	
  find	
  the	
  information	
  they	
  need	
  to	
  create	
  opportunities	
  for	
  competitive	
  advantage.	
  In	
  
today’s	
  information-­‐driven	
  environment,	
  search	
  addresses	
  the	
  critical	
  problems	
  created	
  by	
  the	
  
explosive	
  growth	
  of	
  content	
  by	
  slashing	
  the	
  time	
  and	
  effort	
  users	
  expend	
  in	
  finding	
  data	
  they	
  value.	
  
Search	
  spans	
  the	
  range	
  of	
  business	
  models	
  and	
  use	
  cases:	
  from	
  driving	
  direct	
  customer	
  sales,	
  to	
  
analytics	
  and	
  business	
  intelligence,	
  employee	
  productivity,	
  and	
  reduced	
  administrative	
  overhead.	
  	
  
Apache	
  Lucene/Solr1	
  open	
  source	
  search	
  technology	
  has	
  been	
  implemented	
  across	
  the	
  broadest	
  
range	
  of	
  applications	
  and	
  business	
  models—and	
  likely	
  in	
  ways	
  that	
  can	
  fit	
  the	
  needs	
  of	
  your	
  
organization.	
  In	
  successful	
  operation	
  today	
  at	
  thousands	
  of	
  enterprises,	
  Lucene/Solr	
  technology	
  
scales	
  from	
  tens	
  of	
  thousands	
  to	
  hundreds	
  and	
  billions	
  of	
  documents;	
  searches	
  data	
  that	
  is	
  
structured,	
  unstructured,	
  and	
  in	
  combination;	
  data	
  inside	
  and	
  outside	
  the	
  firewall;	
  and	
  ranges	
  in	
  
use	
  from	
  a	
  simple	
  website	
  search	
  box	
  through	
  sophisticated	
  faceted	
  navigation.	
  It	
  addresses	
  equally	
  
diverse	
  business	
  processes	
  and	
  mission	
  critical	
  applications.	
  Across	
  the	
  spectrum,	
  Lucene/Solr	
  
helps	
  users	
  find,	
  make	
  sense	
  of,	
  and	
  act	
  upon	
  information	
  quickly	
  and	
  efficiently.	
  
In	
  this	
  white	
  paper,	
  we’ll	
  review	
  real-­‐world	
  case	
  studies	
  for	
  Lucene/Solr	
  functionality	
  across	
  
business	
  sectors	
  to	
  demonstrate	
  its	
  versatility	
  and	
  varied	
  applicability.	
  The	
  diversity	
  of	
  examples	
  
provides	
  strong	
  evidence	
  of	
  Lucene/Solr’s	
  flexibility	
  and	
  power	
  as	
  a	
  search	
  technology.	
  The	
  
examples	
  also	
  attest	
  to	
  the	
  innovation	
  and	
  transparency	
  inherent	
  to	
  the	
  open	
  source	
  development	
  
model.	
  Our	
  focus	
  is	
  on	
  familiarizing	
  the	
  audience	
  of	
  business	
  managers	
  and	
  application	
  owners	
  with	
  
existing	
  Lucene/Solr	
  applications;	
  the	
  substantial	
  technical	
  advantages	
  to	
  developers	
  are	
  covered	
  
elsewhere.	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  
1
 Lucene and Solr are complementary technologies that offer very similar underlying capabilities; Solr is the Lucene
Search Server. Since Lucene serves as the core of Solr’s search capabilities, this paper refers to the two as
Lucene/Solr. For more information, see the Appendix.

The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                                                                                                                Page 1
                                                                       	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
              	
  



We’ll	
  first	
  survey	
  the	
  key	
  requirements	
  and	
  business	
  use	
  cases	
  of	
  search	
  and	
  then	
  look	
  at	
  where	
  
they	
  are	
  built	
  into	
  search	
  applications.	
  Our	
  objective	
  is	
  to	
  provide	
  business	
  managers	
  and	
  
application	
  owners	
  with	
  a	
  broad	
  perspective	
  on	
  how	
  Lucene/Solr	
  search	
  technology	
  is	
  used	
  to	
  build	
  
solutions	
  to	
  compelling	
  business	
  problems.	
  In	
  the	
  Appendix,	
  we	
  provide	
  an	
  overview	
  of	
  
Lucene/Solr’s	
  key	
  features	
  and	
  benefits,	
  with	
  a	
  basic	
  outline	
  of	
  the	
  capabilities	
  offered	
  to	
  meet	
  the	
  
broadest	
  range	
  of	
  business	
  needs.	
  	
  


Understanding Search
Opportunities and Requirements
Search	
  technology	
  has	
  come	
  a	
  long	
  way	
  from	
  its	
  roots	
  in	
  matching	
  keywords	
  with	
  appearance	
  in	
  
documents	
  and	
  obtaining	
  undifferentiated	
  results.	
  Search	
  today	
  empowers	
  users	
  by	
  delivering	
  
actionable	
  information	
  quickly	
  and	
  efficiently,	
  across	
  multiple,	
  diverse	
  sources	
  of	
  data.	
  The	
  
business	
  use	
  cases	
  range	
  from	
  executing	
  mission	
  critical	
  commercial	
  transactions	
  (e.g.,	
  e-­‐commerce	
  
sites)	
  to	
  unlocking	
  employee	
  and	
  end-­‐user	
  productivity	
  in	
  the	
  search	
  for	
  a	
  single	
  relevant	
  document	
  
(e.g.,	
  enterprise	
  search).	
  	
  
Given	
  the	
  breadth	
  of	
  capability	
  of	
  the	
  problem	
  domain,	
  it’s	
  useful	
  to	
  look	
  at	
  search	
  and	
  ask	
  two	
  
fundamental	
  questions:	
  “How	
  it	
  can	
  it	
  solve	
  my	
  business	
  problems?”	
  and	
  “What	
  new	
  business	
  
opportunities	
  can	
  search	
  solve	
  for?”	
  
In	
  considering	
  how	
  search	
  technology	
  solves	
  business	
  problems,	
  it	
  is	
  useful	
  to	
  start	
  with	
  an	
  
elucidation	
  of	
  the	
  requirements	
  you’ll	
  need	
  to	
  consider	
  for	
  your	
  search	
  application.	
  At	
  the	
  same	
  
time,	
  be	
  sure	
  to	
  look	
  more	
  broadly	
  at	
  the	
  capabilities	
  that	
  Lucene/Solr	
  offers,	
  as	
  it	
  can	
  help	
  open	
  up	
  
new	
  frontiers	
  for	
  incorporating	
  search	
  and	
  leveraging	
  more	
  value	
  from	
  data	
  repositories.	
  	
  
Starting	
  with	
  some	
  basic	
  questions—what,	
  who,	
  how,	
  and	
  where—you	
  can	
  clarify	
  the	
  high-­‐level	
  
business	
  requirements	
  specific	
  to	
  your	
  business	
  needs,	
  which	
  in	
  turn	
  allow	
  you	
  to	
  make	
  the	
  best	
  
decisions	
  for	
  your	
  search	
  application.	
  The	
  process	
  of	
  looking	
  at	
  the	
  fundamentals	
  also	
  raises	
  new	
  
questions	
  about	
  how	
  and	
  where	
  the	
  search	
  technology	
  offered	
  by	
  Lucene	
  and	
  Solr	
  can	
  create	
  new	
  
business	
  opportunities.	
  
Let’s	
  look	
  at	
  four	
  fundamental	
  questions	
  you	
  should	
  address	
  in	
  understanding	
  search	
  opportunities	
  
and	
  requirements:	
  
                  •     What	
  data	
  and	
  documents	
  are	
  you	
  searching?	
  	
  
                  •     Who	
  needs	
  the	
  results	
  and	
  why?	
  	
  
                  •     Where	
  is	
  search	
  integrated	
  with	
  IT	
  Infrastructure?	
  	
             	
  	
  
                  •     How	
  is	
  the	
  search	
  interface	
  presented	
  to	
  the	
  user?	
  	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                          Page 2
                                                                    	
                                                                      	
  
	
                                                                     	
                                                                      	
  
	
                                                                     	
                                                                      	
  
	
                                                                     	
                                                                      	
  
	
                                                                     	
                                                                      	
  
	
                                                                     	
                                                                      	
  
	
                                                                     	
                                                                      	
  
	
                                                                     	
                                                                      	
              	
  




What Data and Documents Are You Searching?
Business	
  today	
  is	
  driven	
  more	
  than	
  ever	
  by	
  the	
  end-­‐users’	
  creation	
  and	
  consumption	
  of	
  real-­‐time	
  
information.	
  A	
  key	
  differentiating	
  capability	
  of	
  search	
  technology	
  is	
  ingesting	
  a	
  broad	
  range	
  of	
  
content	
  types	
  and	
  processing	
  large	
  collections	
  of	
  diverse	
  data	
  in	
  real	
  time	
  in	
  order	
  to	
  deliver	
  
actionable	
  information.	
  Two	
  aspects	
  to	
  consider:	
  
       •   Types	
  of	
  Content	
  
           Content	
  comes	
  in	
  multiple	
  formats:	
  HTML	
  pages,	
  XML	
  files,	
  PDFs,	
  images,	
  PowerPoint	
  
           presentations,	
  Excel	
  spreadsheets,	
  Word	
  documents,	
  log	
  files,	
  multimedia	
  content,	
  and	
  
           more.	
  Content	
  resides	
  in	
  various	
  repositories,	
  including	
  databases,	
  file	
  servers,	
  content	
  
           management	
  systems,	
  archiving	
  systems,	
  collaboration	
  applications,	
  and	
  employee	
  
           desktops	
  and	
  laptops.	
  Search	
  technology	
  must	
  be	
  able	
  to	
  locate,	
  organize,	
  and	
  aggregate	
  
           data	
  whatever	
  its	
  form	
  or	
  location.	
  	
  
       •   Frequency	
  of	
  Updating	
  Content	
  
           Organizations	
  update	
  content	
  at	
  varying	
  intervals,	
  driven	
  by	
  differing	
  business	
  processes	
  
           and	
  models—social	
  media	
  or	
  news	
  applications	
  have	
  real-­‐time	
  content	
  need,	
  whereas	
  an	
  e-­‐
           commerce	
  application	
  might	
  re-­‐index	
  in	
  response	
  to	
  new	
  inventory	
  on	
  a	
  batch	
  basis	
  and	
  a	
  
           research	
  institution	
  might	
  add	
  to	
  its	
  collection	
  less	
  often	
  still.	
  Search	
  applications	
  need	
  to	
  be	
  
           adaptable	
  to	
  the	
  differences	
  in	
  content	
  change	
  frequency.	
  


Who Needs the Results and Why?
Business	
  search	
  puts	
  a	
  high	
  priority	
  on	
  end	
  user	
  experience	
  and	
  results	
  in	
  which	
  the	
  searched	
  
content	
  is	
  tuned	
  to	
  the	
  unique	
  needs	
  of	
  each	
  user.	
  Because,	
  after	
  all,	
  the	
  human	
  dimension—the	
  
usefulness	
  of	
  results	
  and	
  the	
  efficacy	
  of	
  interaction—is	
  the	
  acid	
  test	
  of	
  a	
  search	
  application.	
  Internet	
  
search	
  applications	
  like	
  Google,	
  Yahoo,	
  and	
  Bing	
  are	
  now	
  common	
  and	
  mature.	
  They	
  have	
  raised	
  
user	
  expectations	
  about	
  key	
  qualities	
  of	
  the	
  search	
  experience...but	
  they	
  solve	
  a	
  very	
  different	
  
problem.	
  	
  
While	
  Internet	
  searches	
  can	
  produce	
  millions	
  of	
  results	
  in	
  milliseconds,	
  they	
  rely	
  on	
  measures	
  like	
  
website	
  popularity	
  or	
  URLs	
  and	
  domain	
  names—not	
  relevant	
  and	
  not	
  generally	
  applicable	
  to	
  
purpose-­‐built	
  applications	
  for	
  businesses.	
  What’s	
  more,	
  they	
  rely	
  on	
  generalizing	
  relevancy	
  for	
  a	
  
global	
  population	
  of	
  all	
  Internet	
  users,	
  without	
  being	
  tied	
  to	
  business	
  rules,	
  or	
  business	
  process	
  
logic,	
  or	
  the	
  opportunity	
  cost	
  of	
  improved	
  precision	
  for	
  a	
  specific	
  set	
  of	
  data	
  or	
  search	
  users.	
  
Business	
  search	
  applications	
  cannot	
  rely	
  on	
  such	
  brute	
  force	
  coarse	
  approaches	
  to	
  tune	
  their	
  
results.	
  They	
  need	
  far	
  more	
  control	
  and	
  precision.	
  They	
  have	
  to	
  be	
  able	
  to	
  deliver	
  highly	
  useful	
  
results	
  while	
  matching,	
  if	
  not	
  exceeding,	
  the	
  levels	
  of	
  user	
  experience	
  that	
  people	
  have	
  come	
  to	
  
expect	
  by	
  virtue	
  of	
  their	
  daily	
  interactions	
  with	
  commercial	
  search	
  engines.	
  Key	
  points	
  of	
  
consideration	
  from	
  a	
  business	
  perspective	
  are:	
  

The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                     Page 3
                                                                     	
                                                                        	
  
	
                                                                      	
                                                                        	
  
	
                                                                      	
                                                                        	
  
	
                                                                      	
                                                                        	
  
	
                                                                      	
                                                                        	
  
	
                                                                      	
                                                                        	
  
	
                                                                      	
                                                                        	
  
	
                                                                      	
                                                                        	
              	
  



       •   Relevance	
  
           Relevance	
  is	
  entirely	
  a	
  factor	
  of	
  the	
  goals	
  of	
  the	
  search	
  application’s	
  users.	
  The	
  application	
  
           must	
  have	
  the	
  mechanisms	
  to	
  recognize	
  the	
  subjective	
  needs	
  of	
  users	
  and	
  tune	
  results	
  
           accordingly.	
  It	
  must	
  also	
  provide	
  easier	
  ways	
  to	
  narrow	
  search	
  criteria	
  without	
  requiring	
  
           users	
  to	
  come	
  up	
  with	
  perfect	
  query	
  terms.	
  Flexibility	
  for	
  drilling	
  deeper	
  will	
  make	
  results	
  
           richer	
  and	
  valuable.	
  Mechanisms	
  to	
  apply	
  filters,	
  proximity	
  values,	
  and	
  sorting	
  parameters	
  
           to	
  narrow	
  search	
  scope	
  can	
  also	
  lead	
  to	
  a	
  richer	
  set	
  of	
  more	
  useful	
  results,	
  with	
  less	
  time	
  
           and	
  effort.	
  
       •   Cost	
  of	
  Relevance	
  	
  
           As	
  business	
  goals	
  are	
  driven	
  by	
  revenue	
  opportunities	
  and	
  cost	
  savings,	
  it	
  is	
  critical	
  to	
  tie	
  
           relevance	
  to	
  the	
  economics	
  of	
  the	
  business.	
  For	
  example,	
  a	
  public-­‐facing	
  retail	
  site	
  should	
  
           focus	
  on	
  matching	
  merchandise	
  to	
  search,	
  site	
  stickiness,	
  and	
  customer	
  loyalty.	
  It	
  requires	
  
           search	
  technology	
  that	
  streamlines	
  and	
  simplifies	
  the	
  shopping	
  experience	
  with	
  relevant	
  
           results	
  directly	
  contributing	
  to	
  sales	
  revenue.	
  For	
  knowledge	
  workers,	
  internal	
  search	
  
           applications	
  should	
  help	
  make	
  employees	
  more	
  productive	
  by	
  reducing	
  the	
  amount	
  of	
  time	
  
           and	
  effort	
  to	
  find	
  documents	
  they	
  need	
  to	
  do	
  their	
  jobs.	
  Multiple	
  studies	
  show	
  that	
  
           information	
  workers	
  can	
  spend	
  20–30%	
  of	
  their	
  time	
  searching	
  for	
  information.	
  
       •   Precision	
  Ranking	
  
           Result	
  accuracy,	
  sorted	
  by	
  attributes	
  like	
  relevance,	
  date,	
  field,	
  or	
  any	
  document	
  property	
  
           feature,	
  makes	
  the	
  search	
  process	
  better.	
  End	
  users	
  generally	
  abandon	
  a	
  search	
  before	
  
           tackling	
  the	
  fine	
  points	
  of	
  Boolean	
  logic	
  or	
  scrolling	
  for	
  a	
  result	
  buried	
  too	
  far	
  down.	
  	
  
       •   Query	
  Response	
  Speed	
  
           Today,	
  5–7	
  seconds	
  is	
  the	
  typical	
  threshold	
  for	
  end-­‐user	
  patience.	
  Too	
  much	
  wait	
  time	
  for	
  
           search	
  results	
  frustrates	
  users,	
  and	
  causes	
  them	
  to	
  abandon	
  pages.	
  Fast,	
  relevant	
  results	
  
           cannot	
  be	
  limited	
  by	
  search	
  technology	
  hamstrung	
  by	
  data	
  influx	
  or	
  query	
  overload.	
  Query	
  
           response	
  time	
  should	
  also	
  work	
  hand-­‐in-­‐hand	
  with	
  the	
  refinement	
  of	
  multiple	
  search	
  
           attributes,	
  so	
  that	
  increasingly	
  complex	
  queries	
  do	
  not	
  extract	
  a	
  performance	
  penalty.	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                        Page 4
                                                                   	
                                                                      	
  
	
                                                                    	
                                                                      	
  
	
                                                                    	
                                                                      	
  
	
                                                                    	
                                                                      	
  
	
                                                                    	
                                                                      	
  
	
                                                                    	
                                                                      	
  
	
                                                                    	
                                                                      	
  
	
                                                                    	
                                                                      	
              	
  




Where Is Search Integrated with IT Infrastructure?
Useful,	
  valuable	
  search	
  technology	
  rarely	
  exists	
  in	
  isolation.	
  Searched	
  data	
  is	
  transformed	
  into	
  
actionable	
  information	
  when	
  it	
  is	
  integrated	
  with	
  the	
  organization’s	
  information	
  infrastructure:	
  
business	
  process	
  to	
  business	
  intelligence	
  to	
  content	
  management	
  systems.	
  A	
  robust	
  search	
  
technology	
  must	
  be	
  customizable	
  to	
  integrate	
  with	
  the	
  existing	
  systems	
  seamlessly.	
  	
  
       •     Application	
  Integration	
  
             A	
  key	
  requirement	
  for	
  a	
  search	
  application	
  is	
  its	
  extensibility	
  for	
  integration	
  with	
  existing	
  
             infrastructure	
  and	
  applications	
  like	
  content	
  management	
  systems,	
  databases,	
  and	
  the	
  full	
  
             range	
  of	
  business	
  processes	
  and	
  applications.	
  It	
  should	
  have	
  interfaces	
  that	
  support	
  
             ingestion	
  of	
  data	
  as	
  well	
  as	
  delivery	
  of	
  results	
  in	
  readily	
  consumable	
  formats—because	
  in	
  
             many	
  cases,	
  results	
  are	
  consumed	
  by	
  other	
  applications,	
  not	
  a	
  human.	
  
       •      Scalability	
  
              We	
  can	
  assume	
  that	
  data	
  will	
  change	
  and	
  grow.	
  So	
  scalability	
  is	
  a	
  key	
  factor	
  for	
  search	
  
              application.	
  Applications	
  should	
  grow	
  to	
  address	
  future	
  needs	
  without	
  penalties	
  for	
  the	
  
              breadth	
  of	
  data	
  or	
  for	
  the	
  count	
  of	
  documents	
  indexed.	
  The	
  search	
  application	
  should	
  be	
  
              able	
  to	
  grow	
  with	
  the	
  requirements	
  of	
  the	
  organization,	
  without	
  needing	
  additional	
  large	
  
              investments	
  in	
  hardware	
  to	
  match	
  the	
  pace	
  of	
  growth.	
  Proprietary	
  search	
  vendors	
  often	
  
              charge	
  for	
  search	
  by	
  the	
  number	
  of	
  documents	
  indexed.	
  In	
  a	
  world	
  where	
  constantly	
  
              expanding	
  content	
  growth	
  is	
  the	
  norm,	
  such	
  costs	
  can	
  be	
  a	
  real	
  and	
  substantial	
  drag	
  on	
  
              the	
  cost	
  of	
  ownership	
  for	
  search	
  applications,	
  many	
  times	
  resulting	
  in	
  negative	
  return.	
  	
  
       •      Security	
  
              Every	
  organization	
  has	
  its	
  own	
  security	
  requirements	
  and	
  access	
  controls.	
  Search	
  
              technologies	
  need	
  to	
  comply	
  with	
  the	
  security	
  policies	
  of	
  the	
  enterprise,	
  controlling	
  
              results	
  that	
  have	
  restricted	
  access.	
  The	
  search	
  technology	
  should	
  also	
  be	
  able	
  to	
  make	
  use	
  
              of	
  document-­‐level	
  security	
  from	
  other	
  sources.	
  	
  


How Is the Search Interface Presented to the User?
The	
  user	
  interface	
  is	
  where	
  search	
  delivers	
  on	
  findability	
  and	
  presents	
  actionable	
  results.	
  The	
  
search	
  application	
  is	
  only	
  as	
  good	
  as	
  the	
  convenience	
  of	
  submitting	
  queries,	
  reviewing	
  and	
  refining	
  
results,	
  and	
  finding	
  information.	
  Key	
  aspects	
  to	
  consider:	
  	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                    Page 5
                                                               	
                                                                            	
  
	
                                                                	
                                                                            	
  
	
                                                                	
                                                                            	
  
	
                                                                	
                                                                            	
  
	
                                                                	
                                                                            	
  
	
                                                                	
                                                                            	
  
	
                                                                	
                                                                            	
  
	
                                                                	
                                                                            	
              	
  



       •   Navigation	
  
           Users	
  benefit	
  from	
  guidance	
  that	
  makes	
  their	
  queries	
  more	
  productive.	
  Techniques	
  such	
  as	
  
           faceted	
  search	
  with	
  result	
  clustering,	
  advance	
  hinting	
  (“did	
  you	
  mean”),	
  “more	
  like	
  this,”	
  
           and	
  drop	
  down	
  menus	
  for	
  setting	
  search	
  scope	
  help	
  users	
  achieve	
  desired	
  results	
  faster,	
  
           making	
  a	
  search	
  application	
  both	
  user-­‐	
  and	
  information-­‐friendly.	
  It	
  is	
  also	
  important	
  to	
  
           allow	
  users	
  to	
  draw	
  associative	
  connections	
  between	
  results—using	
  the	
  technology	
  to	
  
           uncover	
  relationships	
  and	
  discover	
  more	
  about	
  what	
  they	
  were	
  seeking	
  than	
  they	
  knew	
  at	
  
           the	
  outset.	
  
           	
  


                                                                                                                  The	
  NetFlix	
  search	
  
                                                                                                                  application	
  is	
  powered	
  
                                                                                                                  by	
  Solr;	
  it	
  adds	
  the	
  fuzzy	
  
                                                                                                                  dimension	
  to	
  search,	
  
                                                                                                                  with	
  auto-­completion	
  of	
  
                                                                                                                  movie	
  names,	
  correction	
  
                                                                                                                  of	
  misspelled	
  names	
  of	
  
                                                                                                                  actors,	
  and	
  suggests	
  
                                                                                                                  titles	
  closest	
  to	
  the	
  
                                                                                                                  query.	
  As	
  a	
  result,	
  85%	
  
                                                                                                                  of	
  users	
  have	
  found	
  the	
  
                                                                                                                  movie	
  they	
  were	
  looking	
  
                                                                                                                  for	
  ranked	
  at	
  the	
  #1	
  spot	
  
                                                                                                                  in	
  the	
  results.	
  
                                                                                                           	
  

                                                                                                            	
  
              	
  
       •   Discovery	
  
           Search	
  application	
  functionality	
  should	
  extend	
  beyond	
  the	
  generic	
  presentation	
  of	
  a	
  result	
  
           list	
  of	
  documents	
  that	
  contain	
  a	
  keyword.	
  Highlighting	
  keywords	
  in	
  searched	
  results,	
  
           expanding	
  searches	
  with	
  synonyms	
  and	
  spell	
  checking,	
  and	
  offering	
  users	
  ways	
  to	
  learn	
  a	
  
           bit	
  more	
  about	
  documents	
  in	
  the	
  results	
  without	
  having	
  to	
  load	
  the	
  document	
  are	
  great	
  
           ways	
  to	
  significantly	
  improve	
  usability.	
  	
  
	
  
       •   Intuitive	
  Intelligence	
  
           Search	
  applications	
  must	
  go	
  beyond	
  keyword	
  search	
  to	
  help	
  users	
  retrieve	
  accurate	
  
           information	
  even	
  when	
  they	
  are	
  not	
  sure	
  of	
  the	
  best	
  keywords.	
  Additionally,	
  they	
  should	
  
           reduce	
  misinterpretations	
  where	
  homonyms,	
  spelling	
  errors,	
  and	
  ambiguous	
  keywords	
  are	
  
           involved	
  (e.g.,	
  is	
  “apple”	
  a	
  fruit	
  or	
  a	
  computer	
  company?).	
  

The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                      Page 6
                                                           	
                                                             	
  
	
                                                            	
                                                             	
  
	
                                                            	
                                                             	
  
	
                                                            	
                                                             	
  
	
                                                            	
                                                             	
  
	
                                                            	
                                                             	
  
	
                                                            	
                                                             	
  
	
                                                            	
                                                             	
              	
  




The Real World: Applications and Case Studies
With	
  an	
  understanding	
  of	
  the	
  fundamentals	
  of	
  search	
  business	
  applications	
  in	
  hand,	
  it	
  is	
  
helpful	
  to	
  gain	
  additional	
  context	
  on	
  business	
  usage	
  through	
  a	
  survey	
  of	
  organizations	
  that	
  
have	
  successfully	
  used	
  Lucene/Solr	
  for	
  powerful	
  search	
  applications.	
  	
  
All	
  of	
  these	
  cases	
  were	
  built	
  on	
  the	
  capability	
  of	
  Lucene/Solr	
  to	
  provide	
  innovative,	
  high-­‐
performance,	
  cross-­‐platform,	
  feature-­‐rich	
  search	
  technology	
  suitable	
  for	
  nearly	
  every	
  
application.	
  By	
  powering	
  diverse	
  search	
  applications	
  for	
  thousands	
  of	
  organizations	
  such	
  
as	
  AT&T,	
  Zappos,	
  McClatchy,	
  Smithsonian,	
  MTV	
  Networks,	
  LinkedIn,	
  MySpace,	
  Comcast,	
  
Monster,	
  Netflix,	
  and	
  many	
  more,	
  Lucene/Solr	
  has	
  provided	
  mission	
  critical	
  capability	
  that	
  
turns	
  search	
  into	
  a	
  robust	
  competitive	
  advantage.	
  	
  
For	
  these	
  organizations,	
  Lucene/Solr	
  solutions	
  regularly	
  index	
  and	
  search	
  hundreds	
  of	
  
millions	
  of	
  documents	
  with	
  subsecond	
  response	
  time,	
  unencumbered	
  by	
  costly	
  licensing	
  or	
  
vendor	
  lock-­‐in.	
  Together	
  they	
  represent	
  a	
  compelling	
  argument	
  for	
  the	
  broad	
  applicability	
  
of	
  Lucene/Solr	
  across	
  the	
  full	
  range	
  of	
  business	
  opportunities	
  and	
  search	
  needs.	
  Business	
  
use	
  case	
  studies	
  we’ll	
  review	
  include:	
  
       •   Yellow	
  Pages,	
  Local	
  Search,	
  and	
  Searching	
  Classifieds	
  
       •   Media	
  
       •   E-­‐commerce	
  	
  
       •   Job	
  and	
  Career	
  Sites	
  	
  
       •   Libraries,	
  Archives,	
  and	
  Museums	
  (LAMs)	
  Search	
  	
  
       •   Social	
  Media	
  Search	
  	
  
       •   Enterprise	
  (Intranet)	
  Search	
  	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                   Page 7
                                                                                        	
                                                                           	
  
	
                                                                                         	
                                                                           	
  
	
                                                                                         	
                                                                           	
  
	
                                                                                         	
                                                                           	
  
	
                                                                                         	
                                                                           	
  
	
                                                                                         	
                                                                           	
  
	
                                                                                         	
                                                                           	
  
	
                                                                                         	
                                                                           	
              	
  




Yellow Pages, Local Search, and Searching                                                                                               Requirements	
  	
  
Classifieds
In	
  the	
  business	
  of	
  online	
  local	
  search,	
  geographic-­‐based	
  (location)	
                                         •      Intelligent	
  results	
  going	
  
                                                                                                                                               beyond	
  keyword	
  search	
  
relevance	
  generates	
  competitive	
  advantage.	
  Online	
  directories	
  
need	
  to	
  provide	
  a	
  rich,	
  interactive	
  search	
  experience	
  to	
  users	
  to	
                                       •      Deeper,	
  faceted	
  
increase	
  site	
  views	
  and	
  stickiness,	
  which	
  in	
  turn	
  translates	
  into	
                                                 navigation	
  
increased	
  advertising	
  revenue.	
  Simplified	
  location-­‐based	
  search,	
                                                     •      Seamless	
  integration	
  
                                                                                                                                               with	
  latest	
  Web	
  2.0	
  
intuitive	
  faceted	
  query	
  response,	
  and	
  data	
  mashups	
  are	
  a	
  few	
  
features	
  that	
  define	
  search	
  functionality	
  for	
  an	
  online	
  directory.	
                                                   tools	
  
                                                                                                                                        •      Lower	
  IT-­‐related	
  costs	
  
Lucene/Solr	
  solutions	
  offer	
  accurate	
  search	
  results,	
  factoring	
  in	
                                                •      Geocentric	
  user	
  
location,	
  users’	
  reviews,	
  and	
  ratings,	
  alongside	
  paid	
  advertising.	
  By	
                                                experience	
  
taking	
  advantage	
  of	
  Solr’s	
  open	
  source	
  model—with	
  search	
                                                         •      Search	
  numeric	
  values	
  
algorithms	
  that	
  are	
  completely	
  transparent—companies	
  can	
  invest	
                                                     	
  
in	
  configuring	
  their	
  search	
  solutions	
  to	
  match	
  their	
  business	
  logic,	
  
                                                                                                                                        Solr	
  Solution	
  
rather	
  than	
  trying	
  to	
  infer	
  or	
  pay	
  for	
  exposure	
  proprietary	
  back-­‐
end	
  logic.	
  	
                                                                                                                     •      Customizable	
  Search	
  
                                                                                                                                               Index	
  which	
  can	
  be	
  
	
  
                                                                                                                                               tuned	
  transparently	
  to	
  
	
                                           Internet	
  Yellow	
  pages	
  and	
  local	
                                                     account	
  for	
  key	
  
	
                                           online	
  search	
  is	
  forecast	
  to	
                                                        findability	
  drivers	
  
                                                                                                                                        •      Drop	
  down	
  filters	
  for	
  
                                             grow	
  to	
  $27.8	
  billion	
  in	
  2011.	
  
	
                                                                                                                                             narrowing	
  or	
  widening	
  
	
                                                                              The	
  Kelsey	
  Report1	
                                     the	
  scope	
  of	
  search	
  
                                                                                                                                        •      Seamless	
  integration	
  
Success	
  Stories	
                                                                                                                           with	
  existing	
  
                                                                                                                                               technologies	
  
       •       YP.com,	
  a	
  division	
  of	
  AT&T	
  Interactive	
  
                                                                                                                                        •      Native	
  numeric	
  
       •       Zvents.com,	
  local	
  event	
  search	
  service	
  	
  
                                                                                                                                               encoding	
  and	
  search	
  
       •       Yelp.com,	
  the	
  community	
  local	
  search	
  site	
  
                                                                                                                                               capabilities	
  
	
                                           M                                                                                          •      Reduced	
  server	
  
	
                                                                                                                                             footprint	
  for	
  lower	
  TCO	
  
                                             	
                                                                                                than	
  most	
  commercial	
  
	
  
                                                                                                                                               vendors	
  	
  
	
                                                                                                                                      	
  
1The	
  Kelsey	
  Group’s	
  Global	
  Print	
  Yellow	
  Pages,	
  Internet	
  Yellow	
  Pages	
  and	
  Local	
  Search	
  Five	
  
                                                                                                                                        	
  
Year	
  Outlook	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                                              Page 8
                                                                             	
                                                                              	
  
	
                                                                              	
                                                                              	
  
	
                                                                              	
                                                                              	
  
	
                                                                              	
                                                                              	
  
	
                                                                              	
                                                                              	
  
	
                                                                              	
                                                                              	
  
	
                                                                              	
                                                                              	
  
	
                                                                              	
                                                                              	
              	
  




                                                                                             	
  	
  	
  	
                                              	
  
Case	
  Study	
  1	
  
	
  
       yp.com	
  by	
  AT&T	
  Interactive	
  	
  
	
  
       AT&T	
  Interactive	
  is	
  an	
  online	
  and	
  mobile	
  search	
  and	
  advertising	
  company.	
  Their	
  leading-­‐edge	
  portal,	
  yp.com—an	
  
	
   online	
  business	
  listing	
  and	
  advertising	
  site—was	
  originally	
  implemented	
  with	
  a	
  commercial	
  proprietary	
  search	
  
	
   application.	
  It	
  faced	
  issues	
  of	
  scalability,	
  vendor	
  lock-­‐in,	
  and	
  performance.	
  With	
  help	
  from	
  Lucid	
  Imagination,	
  AT&T	
  
       successfully	
  migrated	
  to	
  a	
  Solr-­‐based	
  search	
  solution	
  that	
  leveraged	
  the	
  flexibility	
  of	
  open	
  source	
  without	
  
       compromising	
  features	
  and	
  functionality.	
  	
  And	
  they	
  did	
  so	
  with	
  a	
  much	
  smaller	
  budget.	
  	
  
       Business	
  Needs	
  
             •   Addressing	
  the	
  need	
  to	
  factor	
  in	
  location	
  to	
  support	
  geographic	
  search,	
  and	
  include	
  relevant	
  comments	
  
             •   Striking	
  a	
  balance	
  between	
  organic	
  search	
  and	
  advertised	
  content	
  
             •   Indexing	
  highly	
  unstructured	
  content	
  such	
  as	
  user	
  comments	
  	
  
             •   Increasing	
  relevancy	
  of	
  results	
  and	
  boosting	
  paid	
  search	
  results	
  for	
  preferential	
  placement	
  of	
  advertisers	
  
             •   Linguistic	
  support	
  to	
  enable	
  search	
  experience,	
  such	
  as	
  spellchecking,	
  synonyms,	
  find-­‐similar,	
  etc.	
  
             •   Integrating	
  with	
  latest	
  Web	
  2.0	
  tools	
  
             •   Reducing	
  server	
  footprint	
  
                 	
  
       The	
  Solr	
  Solution	
  	
  
             •     Context-­‐specific	
  relevancy,	
  geographic	
  proximity,	
  ad	
  placement,	
  and	
  user	
  comments	
  
             •     Faceting,	
  drop	
  down	
  filters	
  to	
  narrow/widen	
  the	
  scope	
  of	
  search	
  	
  
             •     Functional	
  support	
  for	
  creating	
  new	
  features	
  	
  
             •     Spell-­‐correction,	
  and	
  location-­‐optimized	
  search	
  results	
  to	
  show	
  users	
  businesses	
  nearest	
  to	
  them	
  first	
  
             •     Seamless	
  integration	
  with	
  many	
  Web	
  2.0	
  tools	
  to	
  create	
  innovative	
  features	
  and	
  mashups	
  
             •     Lowers	
  TCO	
  by	
  reducing	
  the	
  number	
  of	
  search	
  servers	
  from	
  120	
  to	
  two	
  dozen	
  servers	
  	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                                      Page 9
                                                                      	
                                                                	
  
	
                                                                       	
                                                                	
  
	
                                                                       	
                                                                	
  
	
                                                                       	
                                                                	
  
	
                                                                       	
                                                                	
  
	
                                                                       	
                                                                	
  
	
                                                                       	
                                                                	
  
	
                                                                       	
                                                                	
         	
  



	
  


Media
Brand	
  reinforcement,	
  premium	
  content,	
  and	
  easy	
  accessibility	
  
are	
  the	
  main	
  business	
  motivators	
  for	
  online	
  media	
  and	
                          Requirements	
  
publishing	
  companies.	
  Relevant	
  information	
  improves	
  time	
  on	
                          •      Real-­‐time	
  indexing	
  of	
  
the	
  site	
  and	
  encourages	
  users	
  to	
  explore	
  related	
  content,	
                             petabytes	
  of	
  structured	
  
boosting	
  subscription	
  rates	
  and	
  site	
  views.	
  These	
  translate	
  into	
  a	
                 and	
  unstructured	
  data	
  	
  
virtuous	
  cycle	
  of	
  additional	
  revenue	
  generation.	
                                        •      Deeper	
  search	
  capability	
  
                                                                                                         •      Improved	
  query	
  
Given	
  that	
  content	
  is	
  the	
  business,	
  the	
  need	
  for	
  a	
  robust	
  search	
  
                                                                                                                response	
  time	
  
application	
  ties	
  directly	
  to	
  competitive	
  advantage.	
  	
  
                                                                                                         •      Reduced	
  	
  infrastructure	
  
Lucene/Solr	
  provides	
  a	
  customized,	
  function	
  rich	
  solution	
  for	
  the	
                     and	
  customization	
  costs	
  
media	
  and	
  publishing	
  industry.	
  It	
  addresses	
  dynamic	
  challenges	
                    	
  
of	
  content	
  diversity,	
  content	
  freshness,	
  and	
  content	
  acquisition	
  ,	
             Solr	
  Solution	
  
and	
  gives	
  companies	
  a	
  platform	
  on	
  which	
  	
  to	
  build	
  a	
  world-­‐class	
     • Reverse	
  indexing	
  
innovative	
  search	
  experience	
  to	
  differentiate	
  themselves	
  in	
  a	
                     • Intelligent,	
  faceted	
  search	
  
highly	
  competitive	
  marketplace.	
  	
                                                                  to	
  enable	
  contextual	
  and	
  
                                                                                                             linguistic	
  relevance	
  
	
  
                                                                                                         • Easy	
  configuration	
  for	
  
	
                              “Solr	
  has	
  done	
  wonders	
  for	
  us.	
                              parsing	
  structured	
  and	
  
	
                              It	
  is	
  easy	
  to	
  understand	
  and	
                                unstructured	
  data	
  
                                deploy,	
  and	
  has	
  reduced	
  our	
                                • Easy	
  and	
  seamless	
  
	
                                                                                                           installation	
  for	
  lower	
  
                                costs	
  drastically.”	
                                                     TCO	
  
	
  
	
                                                         Doug	
  Steigerwald,	
                        • Customization	
  with	
  open	
  
                                                                                                             source	
  code	
  
	
                                                   	
  McClatchy	
  Interactive	
                             	
  
	
  
                                                                                                         	
  
	
  
Success	
  Stories	
  
       •    McClatchy	
  Newspapers	
  
       •    Netflix	
  	
  
       •    Comcast	
  Interactive	
  
       •    MTV	
  Networks,	
  a	
  division	
  of	
  Viacom	
  
                              M
       •    The	
  Motley	
  Fool,	
  fool.com	
  	
  
       •    Fanfeedr.com,	
  personalized	
  sports	
  aggregator	
  
                                	
  


The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                         Page 10
                                                                         	
                                                                          	
  
	
                                                                          	
                                                                          	
  
	
                                                                          	
                                                                          	
  
	
                                                                          	
                                                                          	
  
	
                                                                          	
                                                                          	
  
	
                                                                          	
                                                                          	
  
	
                                                                          	
                                                                          	
  
	
                                                                          	
                                                                          	
           	
  



	
  




                                                                                                                                       	
  
       Case	
  Study	
  2	
  
       	
  
         McClatchy—Leading	
  Newspaper	
  Publisher	
  
         The	
  third	
  largest	
  newspaper	
  publisher	
  in	
  the	
  United	
  States,	
  McClatchy	
  Company	
  owns	
  30	
  daily	
  
         newspapers	
  in	
  29	
  markets	
  across	
  the	
  country.	
  To	
  win	
  online,	
  McClatchy	
  knew	
  it	
  had	
  to	
  have	
  a	
  robust	
  
         search	
  solution,	
  to	
  empower	
  the	
  McClatchy	
  audience	
  with	
  the	
  information	
  they	
  wanted	
  and	
  secure	
  
         loyalty	
  from	
  readers	
  and	
  sponsorships	
  from	
  advertisers.	
  Working	
  with	
  Lucid	
  Imagination,	
  McClatchy	
  
         migrated	
  from	
  proprietary	
  search	
  software	
  to	
  open	
  source	
  and	
  chose	
  Solr	
  for	
  its	
  high	
  performance,	
  
         comprehensive	
  capabilities,	
  and	
  superior	
  value	
  	
  
         Requirements	
  
             • Proliferating	
  content	
  and	
  data	
  sources	
  (text,	
  videos,	
  audios,	
  images),	
  with	
  real-­‐time	
  
                   streaming	
  	
  
             • Empowering	
  end	
  users	
  with	
  ease	
  of	
  use	
  
             • Supporting	
  peak	
  traffic	
  and	
  popular	
  search	
  spikes	
  with	
  consistent	
  performance	
  
             • Providing	
  scalability	
  for	
  a	
  database	
  growing	
  by	
  orders	
  of	
  magnitude	
  annually	
  
             • Providing	
  flexibility	
  to	
  support	
  customization	
  
             • Controlling	
  IT	
  costs	
  while	
  exceeding	
  performance	
  benchmarks	
  of	
  competition	
  
                   	
  
         The	
  Lucene/Solr	
  Solution	
  	
  
             • Deeper	
  content	
  by	
  indexing	
  both	
  structured	
  and	
  unstructured	
  data	
  in	
  real	
  time,	
  effortlessly	
  
             • Indexes	
  millions	
  of	
  documents,	
  with	
  search	
  results	
  delivered	
  in	
  milliseconds	
  	
  
             • User-­‐friendly	
  navigation	
  with	
  drop	
  down	
  filters,	
  faceted	
  navigation,	
  linguistic	
  corrections,	
  
                   etc.	
  	
  	
  
             • Excellent	
  performance,	
  even	
  in	
  peak	
  hours,	
  by	
  load-­‐balancing	
  search	
  requests	
  across	
  servers	
  	
  
             • Scalability	
  without	
  impact	
  on	
  performance	
  	
  
             • High	
  degree	
  of	
  customization,	
  since	
  it’s	
  open	
  source	
  
             • Integration	
  with	
  existing	
  IT	
  infrastructure	
  and	
  eliminates	
  associated	
  license	
  fees	
  to	
  cut	
  costs	
  
             • 8-­‐fold	
  reduction	
  in	
  server	
  footprint	
  	
  



The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                       Page 11
                                                                                                                                                                                                                                           	
                                                       	
  
	
                                                                                                                                                                                                                                            	
                                                       	
  
	
                                                                                                                                                                                                                                            	
                                                       	
  
	
                                                                                                                                                                                                                                            	
                                                       	
  
	
                                                                                                                                                                                                                                            	
                                                       	
  
	
                                                                                                                                                                                                                                            	
                                                       	
  
	
                                                                                                                                                                                                                                            	
                                                       	
  
	
                                                                                                                                                                                                                                            	
                                                       	
        	
  




E-commerce
	
  	
  
E-­‐commerce	
  businesses	
  must	
  provide	
  a	
  compelling	
  shopping	
  experience	
                                                                                                                                                                                    Requirements	
  
in	
  order	
  to	
  maintain	
  brand	
  equity	
  and	
  thrive	
  in	
  a	
  very	
  highly	
  competitive	
                                                                                                                                                                 •      Multidimensional,	
  
market	
  landscape.	
  By	
  reducing	
  the	
  time	
  and	
  effort	
  required	
  to	
  navigate	
                                                                                                                                                                                 dynamic	
  search	
  
available	
  merchandise	
  and	
  find	
  what	
  they	
  want,	
  superior	
  search	
                                                                                                                                                                                        •      Faster	
  results	
  
contributes	
  directly	
  to	
  a	
  satisfying	
  buying	
  experience	
  for	
  customers.	
                                                                                                                                                                                 •      Real-­‐time	
  indexing	
  
Search	
  then	
  translates	
  directly	
  into	
  higher	
  revenues	
  and	
  customer	
                                                                                                                                                                                            of	
  products	
  
loyalty.	
  Instant	
  results,	
  intuitively	
  organized,	
  advanced	
  faceting	
  for	
  easy	
                                                                                                                                                                           •      Faceting	
  and	
  
browsing,	
  synchronizing	
  results	
  with	
  images,	
  and	
  integration	
  with	
  user	
                                                                                                                                                                                       browsing	
  
ratings	
  are	
  among	
  the	
  must	
  have	
  features	
  of	
  an	
  e-­‐commerce	
  search	
                                                                                                                                                                                     capabilities	
  
application.	
                                                                                                                                                                                                                                                                  •      Seamless	
  
Lucene/Solr	
  gives	
  companies	
  the	
  ability	
  to	
  build	
  their	
  sites	
  around	
  the	
                                                                                                                                                                                integration	
  with	
  
concept	
  of	
  “searchendizing”—putting	
  the	
  desired	
  merchandise	
  at	
  the	
  top	
                                                                                                                                                                                       existing	
  IT	
  
of	
  the	
  results	
  list—which	
  can	
  make	
  the	
  difference	
  between	
  sales	
  made	
                                                                                                                                                                                   infrastructure	
  
and	
  sales	
  lost.	
  Faceting,	
  database	
  integration,	
  real-­‐time	
  indexing,	
  and	
                                                                                                                                                                             	
  
query	
  monitoring	
  all	
  enable	
  users	
  to	
  find	
  products	
  they	
  want,	
  driving	
                                                                                                                                                                           Solr	
  Solution	
  
conversion	
  rates	
  and	
  enabling	
  a	
  winning	
  online	
  experience.	
  2	
  	
  
                                                                                                                                                                                                                                                                                •      Faceted	
  search	
  for	
  
	
                                                                                                                                                                                                                                                                                     deeper	
  drill	
  down	
  
	
                                                                                                                                                                                                                                 Online	
  retail	
  sales	
  in	
  the	
            and	
  browsing	
  	
  
                                                                                                                                                                                                                                   B2C	
  market	
  are	
  expected	
           •      Intuitive	
  search	
  
	
  
                                                                                                                                                                                                                                                                                       capabilities	
  for	
  
Success	
  Stories	
  
                                                                                                                                                                                                                                   to	
  reach	
  $340	
  billion	
  by	
              cross-­‐channel	
  
                                                                                                                                                                                                                                   201321	
                                            shopping	
  
                           •                          Buy.com	
  
                           •                          Sears.com	
  
                                                                                                                                                                                                                                   	
                                                  experience	
  	
  
                                                                                                                                                                                                                                            Forrester	
  Research	
             •      System	
  
                           •                          Macys.com	
  
                                                                                                                                                                                                                                                                                       administration	
  tools	
  
                           •                          Zappos.com	
  
                                                                                                                                                                                                                                                                                       for	
  data	
  loading,	
  
                           •                          Advanceautoparts.com	
  
                                                                                                                                                                                                                                                                                       index	
  replication,	
  
                           •                          Dollardays.com	
  
                                                                                                                                                                                                                                                                                       monitoring,	
  logging,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
                                                         and	
  cache	
  
                                                                                                                                                                                                                                                                                       management	
  
	
                                                                                                                                                                                                                                                                              •      Query	
  monitoring	
  
2	
  “Consumers	
  will	
  spend	
  more	
  than	
  $340	
  billion	
  online	
  by	
  2013,	
  says	
  Forrester,”	
                                                                                                                                                                  for	
  better	
  
	
  Internet	
  Retailer,	
  27	
  November	
  2009,	
  http://www.internetretailer.com/dailyNews.asp?id=32630.	
                                                                                                                                                                      highlighting	
  of	
  
                                                                                                                                                                                                                                                                                       popular	
  products	
  	
  
                                                                                                                                                                                                                                                                                	
  
The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                                                                                                                                                                      Page 12
                                                                              	
                                                                               	
  
	
                                                                               	
                                                                               	
  
	
                                                                               	
                                                                               	
  
	
                                                                               	
                                                                               	
  
	
                                                                               	
                                                                               	
  
	
                                                                               	
                                                                               	
  
	
                                                                               	
                                                                               	
  
	
                                                                               	
                                                                               	
            	
  




                                                                                          	
  	
  	
  	
  	
  	
  	
  	
                         	
  

Case	
  Study	
  3	
  
       Zappos	
  
       Zappos	
  is	
  the	
  premier	
  destination	
  for	
  online	
  shoe	
  shopping.	
  At	
  Zappos,	
  the	
  mission	
  is	
  excellent	
  online	
  customer	
  
       service—customers	
  should	
  be	
  able	
  to	
  browse	
  shoe	
  styles,	
  sizes,	
  shapes,	
  and	
  colors	
  more	
  easily	
  than	
  any	
  other	
  shoe	
  
       store,	
  on	
  or	
  offline.	
  To	
  achieve	
  this,	
  Zappos	
  wanted	
  a	
  robust,	
  flexible,	
  multifunctional	
  search	
  solution/application.	
  
       After	
  evaluating	
  many	
  commercial	
  search	
  technologies,	
  Zappos	
  zeroed	
  in	
  on	
  Solr,	
  working	
  with	
  Lucid	
  Imagination	
  to	
  
       ensure	
  continued,	
  successful	
  deployment.	
  
       Requirements	
  
             •   Simplified,	
  attractive	
  user	
  experience	
  that	
  makes	
  it	
  easy	
  to	
  find	
  and	
  buy	
  
             •   Relevant	
  results,	
  fast	
  
             •   Navigation	
  across	
  attributes,	
  such	
  as	
  size,	
  color,	
  and	
  style	
  for	
  broader	
  and	
  deeper	
  results	
  
             •   Indexing	
  products	
  as	
  they	
  were	
  entered	
  in	
  the	
  catalogs	
  
             •   Cross-­‐functional	
  navigation	
  to	
  give	
  customers	
  a	
  realistic	
  shopping	
  experience	
  
             •   Intuitive	
  intelligence	
  to	
  provide	
  alternate	
  suggestions	
  
             •   Analytical	
  capabilities	
  to	
  drive	
  business	
  strategy	
  
             •   Facilitating	
  control	
  on	
  results	
  
             •   Integration	
  with	
  existing	
  IT	
  infrastructure	
  
                         	
  
       The	
  Solr	
  Solution	
  
             •     Search	
  results	
  in	
  subseconds,	
  across	
  categories	
  
             •     Faceting,	
  for	
  easy	
  browsing	
  and	
  discovery	
  and	
  a	
  compelling	
  user	
  experience	
  	
  
             •     Real-­‐time	
  indexing	
  of	
  products	
  
             •     Synchronization	
  of	
  visuals,	
  specs,	
  filters,	
  and	
  promotions	
  to	
  make	
  shopping	
  experience	
  true	
  to	
  life	
  
             •     Information	
  on	
  user	
  activity	
  to	
  help	
  build	
  strategy	
  on	
  product	
  promotions	
  
             •     Controls	
  to	
  rank	
  	
  popular	
  or	
  high-­‐stock	
  products	
  in	
  results	
  	
  where	
  users	
  are	
  more	
  likely	
  to	
  buy	
  them	
  
             •     Facilitates	
  integration	
  with	
  heterogeneous	
  open	
  source	
  environment	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                                 Page 13
                                                                    	
                                                                   	
  
	
                                                                     	
                                                                   	
  
	
                                                                     	
                                                                   	
  
	
                                                                     	
                                                                   	
  
	
                                                                     	
                                                                   	
  
	
                                                                     	
                                                                   	
  
	
                                                                     	
                                                                   	
  
	
                                                                     	
                                                                   	
         	
  



	
  
	
  


Job and Career Sites                                                                                                Requirements	
  
	
  
                                                                                                                    •      Linguistic	
  
Job	
  portals	
  are	
  countercyclical	
  to	
  the	
  economy.	
  When	
  the	
  economy	
                              intelligence	
  for	
  
flourishes,	
  posted	
  jobs	
  grow	
  in	
  number;	
  when	
  it	
  sags,	
  candidates	
  flock	
  in	
               more	
  relevant	
  
to	
  post	
  their	
  resumes.	
  Success	
  for	
  an	
  online	
  job	
  portal	
  is	
  tied	
  to	
  the	
            results	
  
efficiency	
  of	
  its	
  search	
  capability—matching	
  résumés	
  to	
  job	
  listings	
  and	
               •      Control	
  search	
  
vice	
  versa—so	
  both	
  employers	
  and	
  prospective	
  employees	
  can	
  zero	
  in	
                            results	
  to	
  maintain	
  
on	
  just	
  the	
  right	
  opportunity.	
                                                                               privacy	
  
For	
  example,	
  an	
  employer	
  may	
  want	
  to	
  navigate	
  through	
  filters	
  to	
                    •      Deeper	
  search	
  
narrow	
  the	
  scope	
  of	
  a	
  candidate	
  search,	
  such	
  as	
  education,	
  previous	
                        capability	
  
employer,	
  salary	
  history,	
  skillsets,	
  etc.;	
  a	
  job	
  seeker	
  may	
  want	
  to	
  expose	
       •      Numeric	
  search	
  
these	
  attributes,	
  but	
  keep	
  a	
  current	
  employer’s	
  name	
  confidential.	
  A	
  job-­‐           •      Faster	
  query	
  
seeker	
  may	
  want	
  to	
  apply	
  to	
  jobs	
  within	
  a	
  particular	
  geographic	
  area.	
                   response	
  
                                                                                                                    •      Reduced	
  
Lucene/Solr	
  not	
  only	
  provides	
  such	
  flexibility	
  but	
  also	
  addresses	
  other	
  
                                                                                                                           infrastructure	
  and	
  
complexities	
  of	
  this	
  industry	
  by	
  enabling	
  linguistic	
  intelligence	
  (such	
  as	
  
                                                                                                                           customization	
  costs	
  
identical	
  acronyms	
  that	
  correspond	
  to	
  different	
  entities;	
  variations	
  in	
                          	
  
spelling,	
  imperfectly	
  constructed	
  search	
  queries);	
  indexing	
  unstructured	
                        Solr	
  Solution	
  
data	
  (résumés);	
  and	
  managing	
  ever-­‐growing	
  data.	
                                                  • Intelligent,	
  faceted	
  
	
                                                                                                                      search	
  to	
  enable	
  
                                                                                                                        contextual	
  and	
  
	
                                        “I	
  think	
  the	
  breakthrough	
  was	
                                   linguistic	
  relevance	
  
	
                                        when	
  we	
  tried	
  it,	
  and	
  we	
                                 • Easy	
  configuration	
  
                                          realized,	
  wow,	
  this	
  thing	
  could	
                                 for	
  parsing	
  
	
                                                                                                                      structured	
  and	
  
                                          really	
  scale.”	
                                                           unstructured	
  data	
  
	
  
                                          	
                                                                        • Easy	
  and	
  seamless	
  
	
                                                  Peter	
  Keegan,	
  Monster.com	
                                   installation	
  for	
  	
  
Success	
  Stories	
                                                                                                    lower	
  TCO	
  
                                                                                                                    • Business	
  process	
  
       •   Monster	
                                                                                                    integration	
  and	
  
       •   The	
  Big	
  Jobs	
                                                                                         Customization	
  with	
  
       •   eBharatJobs	
                                                                                                open	
  source	
  code	
  	
  
       •   Careerjet	
                                                                                                     	
  
                                                                                                                    	
  
                           M
The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                          Page 14
                                          	
  
                                                                                    	
                                                                                 	
  
	
                                                                                     	
                                                                                 	
  
	
                                                                                     	
                                                                                 	
  
	
                                                                                     	
                                                                                 	
  
	
                                                                                     	
                                                                                 	
  
	
                                                                                     	
                                                                                 	
  
	
                                                                                     	
                                                                                 	
  
	
                                                                                     	
                                                                                 	
            	
  




                                                                                                                                        	
  
                                                                                              	
  
                                                                                              	
  

       Monster.com	
  
       Monster	
  is	
  the	
  largest	
  job	
  search	
  engine	
  in	
  the	
  world,	
  with	
  over	
  a	
  million	
  jobs	
  posted	
  at	
  any	
  one	
  time.	
  By	
  2008	
  it	
  had	
  
       150	
  million	
  résumés	
  in	
  its	
  database,	
  serving	
  over	
  63	
  million	
  job	
  seekers	
  per	
  month,	
  now	
  running	
  on	
  average	
  300	
  to	
  
       400	
  queries	
  per	
  second	
  with	
  an	
  average	
  response	
  time	
  of	
  40	
  milliseconds.	
  To	
  provide	
  the	
  highest	
  level	
  of	
  service	
  
       and	
  support	
  to	
  their	
  customers—both	
  employers	
  and	
  job	
  seekers—Monster	
  has	
  an	
  unmatched	
  marketplace	
  for	
  
       employment	
  opportunities,	
  with	
  Lucene-­‐based	
  search	
  at	
  the	
  heart	
  of	
  its	
  business	
  model.	
  
       	
  
       The	
  Requirements	
  	
  
              •  Managing	
  high	
  volumes	
  of	
  data,	
  continually	
  increasing	
  by	
  double	
  digit	
  percentages	
  annually	
  
              •  Maintaining	
  constant	
  inventory	
  updates	
  and	
  providing	
  faster	
  results	
  
              •  Removing	
  technological	
  barriers	
  that	
  limit	
  the	
  scope	
  of	
  information	
  
              •  Enabling	
  end	
  users	
  to	
  refine	
  search	
  and	
  drill	
  deeper	
  without	
  any	
  performance	
  impact	
  
              •  Providing	
  security	
  controls	
  to	
  ensure	
  end	
  user	
  privacy	
  
              •  Facilitating	
  scalability	
  and	
  flexibility	
  in	
  tandem	
  with	
  company’s	
  vision	
  and	
  growth	
  plans	
  
                 	
  
       The	
  Lucene	
  Solution	
  	
  
              •     High	
  volumes	
  of	
  data	
  by	
  clustering	
  data	
  to	
  reduce	
  the	
  index	
  size	
  	
  
              •     Real-­‐time	
  indexing	
  for	
  fresher,	
  faster	
  query	
  results	
  	
  
              •     Intuitive	
  search	
  to	
  enable	
  in-­‐depth	
  cross-­‐functional	
  job	
  and	
  résumé	
  browsing	
  
              •     Faceted	
  search	
  and	
  ‘single	
  click’	
  filters	
  for	
  search	
  refinement	
  	
  
              •     Security	
  controls	
  to	
  manage	
  user	
  information	
  
              •     Unlimited	
  scalability	
  and	
  customization	
  leveraging	
  open	
  source	
  licensing	
  

       	
  
The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                                        Page 15
                                                                   	
                                                                    	
  
	
                                                                    	
                                                                    	
  
	
                                                                    	
                                                                    	
  
	
                                                                    	
                                                                    	
  
	
                                                                    	
                                                                    	
  
	
                                                                    	
                                                                    	
  
	
                                                                    	
                                                                    	
  
	
                                                                    	
                                                                    	
         	
  



                                                                 Case	
  Study	
  4	
  
Libraries, Archives, and Museums (LAMs) Search
The	
  core	
  asset	
  of	
  educational	
  and	
  research	
  institutions	
  is	
  knowledge	
                    Requirements	
  	
  
archived	
  and	
  accumulated	
  over	
  decades.	
  In	
  the	
  world	
  of	
  academic	
  search,	
  
the	
  diversity	
  of	
  information	
  for	
  any	
  query—text,	
  illustration,	
  audio/video	
                 •      Management	
  of	
  	
  
media,	
  or	
  data	
  in	
  any	
  other	
  format—makes	
  unstructured	
  formats	
  a	
  key	
                         multiple	
  formats	
  of	
  
aspect	
  of	
  the	
  searchable	
  archive.	
  	
                                                                         data	
  and	
  documents	
  
                                                                                                                     •      Customization	
  and	
  
Lucene/Solr	
  gives	
  academic	
  and	
  research	
  institutions	
  the	
  power	
  to	
  turn	
                         scalability	
  	
  
information	
  into	
  knowledge	
  by	
  going	
  beyond	
  keyword-­‐driven	
  search	
  to	
                      •      Linguistic	
  support	
  in	
  
expose	
  a	
  rich	
  variety	
  of	
  results	
  and	
  exploration.	
  Based	
  on	
  the	
  open	
  source	
            queries	
  	
  
model,	
  it	
  not	
  only	
  integrates	
  with	
  the	
  existing	
  IT	
  infrastructure	
  but	
  also	
        •      Faster	
  results	
  
leverages	
  the	
  existing	
  classification	
  hierarchies	
  to	
  give	
  structure	
  to	
                     	
  
terabytes	
  of	
  information	
  spread	
  across	
  disparate	
  collections,	
  significantly	
  
reducing	
  overhead	
  and	
  enabling	
  flexible	
  and	
  scalable	
  deployment.	
                              Solr	
  Solution	
  

	
                                                                                                                   •      Optimized	
  index	
  
                                                                                                                            infrastructure	
  limits	
  
	
                               “With	
  Solr,	
  you	
  can	
  do	
  so	
  many	
  things	
                               size	
  without	
  
	
  
                                 without	
  writing	
  a	
  lick	
  of	
  code.	
  I	
  hadn't	
                            compromising	
  speed	
  
                                 realized	
  how	
  easy	
  it	
  is	
  to	
  extend	
  our	
                               or	
  flexibility	
  
	
                               custom	
  request	
  handler,	
  response	
                                         •      Easy	
  customization	
  
                                                                                                                            for	
  implementing	
  
	
                               writer,	
  and	
  update	
  handler.	
  Just	
  move	
                                     taxonomy	
  rules	
  
	
                               it	
  all	
  to	
  Solr	
  and	
  let	
  it	
  do	
  the	
  heavy	
                 •      Faceted	
  search	
  to	
  
	
                               lifting.”	
                                                                                narrow	
  results	
  to	
  a	
  
                                                                                                                            specific	
  source	
  across	
  
	
                                                       Sjored	
  Siebinga,	
  Europeana	
                                 diverse	
  sets	
  of	
  data	
  
                                                                                                                     •      Instant	
  results	
  
Success	
  Stories	
  
                                                                                                                     •      Seamless	
  integration	
  
       •   Smithsonian	
  Institute	
  	
                                                                                   with	
  IT	
  
       •   Europeana,	
  the	
  European	
  Union	
  online	
  cultural	
  archive	
                                        infrastructure	
  for	
  
       •   The	
  US	
  Library	
  of	
  Congress	
  and	
  World	
  Digital	
  Library	
                                   lower	
  TCO	
  
       •   Stanford	
  University	
  Library	
                                                                       	
  
       •   University	
  of	
  Michigan	
  Graduate	
  Library	
  
	
                                                                                                                   	
  

	
  
	
  
	
  


The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                          Page 16
                                                                          	
                                                                           	
  
	
                                                                           	
                                                                           	
  
	
                                                                           	
                                                                           	
  
	
                                                                           	
                                                                           	
  
	
                                                                           	
                                                                           	
  
	
                                                                           	
                                                                           	
  
	
                                                                           	
                                                                           	
  
	
                                                                           	
                                                                           	
           	
  




                                                                                                                                     	
  
	
  Case	
  Study	
  5	
  
	
  
       Smithsonian	
  
	
   The	
  Smithsonian	
  Institution	
  is	
  the	
  flagship	
  museum	
  collection	
  of	
  the	
  United	
  States,	
  supporting	
  a	
  research	
  institute	
  
	
   that	
  provides	
  “one-­‐stop”	
  searching	
  for	
  2	
  million	
  records,	
  including	
  nearly	
  a	
  quarter	
  of	
  a	
  million	
  media	
  files	
  (images,	
  
       media	
  files,	
  online	
  journals,	
  and	
  other	
  resources)	
  distributed	
  across	
  dozens	
  of	
  archives,	
  databases,	
  museums,	
  and	
  
	
   libraries.	
  To	
  make	
  this	
  treasure	
  of	
  information	
  easily	
  accessible	
  to	
  people,	
  the	
  Smithsonian	
  needed	
  an	
  efficient	
  
       search	
  solution	
  that	
  could	
  overcome	
  the	
  following	
  challenges:	
  
       The	
  Challenges	
  
              Managing	
  a	
  complicated	
  taxonomy	
  that	
  could	
  no	
  longer	
  accommodate	
  a	
  growing	
  data	
  index	
  
              •
              Indexing	
  disparate	
  types	
  of	
  content,	
  including	
  documents,	
  videos,	
  and	
  images	
  
              •
              Making	
  information	
  available	
  from	
  a	
  large	
  database	
  
              •
              Providing	
  access	
  controls	
  to	
  restrict	
  information	
  	
  
              •
              Integrating	
  with	
  existing	
  legacy	
  tools	
  	
  
              •
              	
  
       Smithsonian	
  chose	
  Lucene/Solr,	
  and	
  worked	
  with	
  Lucid	
  Imagination	
  to	
  create	
  an	
  optimized,	
  well-­‐designed	
  solution.	
  
       The	
  Solr	
  Solution	
  
              •   Efficient	
  index	
  strategy	
  to	
  manage	
  a	
  mix	
  of	
  structured	
  and	
  unstructured	
  data	
  
              •   Holistic	
  search,	
  by	
  optimizing	
  configuration	
  to	
  reduce	
  	
  the	
  number	
  of	
  servers	
  and	
  better	
  handling	
  query	
  
                  requests	
  
              •   Filtering	
  information	
  through	
  faceted	
  search	
  	
  
              •   Access	
  controls	
  to	
  restrict	
  information	
  based	
  on	
  membership	
  profiles	
  
              •   Integration	
  with	
  the	
  existing	
  IT	
  infrastructure	
  
              •   Provides	
  guidance	
  and	
  assistance	
  on	
  setting	
  replicated	
  search	
  environment	
  
                       	
  
       	
  


The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                         Page 17
                                                                 	
                                                                     	
  
	
                                                                  	
                                                                     	
  
	
                                                                  	
                                                                     	
  
	
                                                                  	
                                                                     	
  
	
                                                                  	
                                                                     	
  
	
                                                                  	
                                                                     	
  
	
                                                                  	
                                                                     	
  
	
                                                                  	
                                                                     	
         	
  




                                                                                                       Requirements	
  	
  
Social Media Search                                                                                    •      Deliver	
  search	
  results	
  
Search	
  solutions	
  must	
  support	
  differentiated	
  business	
  models	
  
matching	
  Web	
  2.0	
  innovations,	
  including	
  user-­‐generated	
  content	
                          as	
  soon	
  as	
  content	
  is	
  
and	
  mashups,	
  without	
  compromising	
  scalability—a	
  challenge,	
                                   available	
  
given	
  the	
  virtually	
  limitless	
  content	
  on	
  the	
  Internet.	
  Success	
  and	
        •      Deeper	
  drill	
  down	
  
differentiation	
  is	
  measured	
  by	
  how	
  well	
  the	
  site	
  provides	
  relevant	
               capabilities	
  
results	
  to	
  grow	
  its	
  user	
  base	
  and	
  keeps	
  them	
  engaged.	
                     •      Intuitive	
  interface	
  
Increasingly,	
  the	
  technological	
  factors	
  driving	
  Web	
  2.0	
  application	
                    	
  
paradigms	
  are	
  finding	
  their	
  way	
  into	
  the	
  enterprise,	
  unlocking	
  
collaboration	
  and	
  productivity	
  in	
  new	
  ways	
  that	
  challenge	
                       Lucene/Solr	
  Solution	
  
conventional	
  organizational	
  bounds—and	
  that	
  rely	
  in	
  equal	
  
measure	
  on	
  search	
  to	
  create	
  the	
  connections	
  between	
  employees	
                •      Near-­‐instant	
  results	
  
to	
  enable	
  discovery,	
  cross-­‐pollination,	
  and	
  more	
  efficient	
  collective	
                with	
  segmentable	
  
effort.	
                                                                                                     indexing	
  	
  
Lucene/Solr	
  not	
  only	
  provides	
  fast	
  results	
  but	
  also	
  facilitates	
              •      Intuitive	
  search	
  	
  
flexible,	
  intuitive	
  navigation	
  to	
  help	
  end	
  users	
  connect	
  with	
  others.	
     •      Data-­‐driven	
  
It	
  boosts	
  the	
  reach	
  and	
  performance	
  of	
  search,	
  while	
  cutting	
                     spellchecking	
  based	
  
implementation	
  costs	
  and	
  lowering	
  barriers	
  to	
  innovation.	
  	
                             on	
  user	
  search	
  
	
                                        	
                                                                  histories	
  
                                                                                                             Linguistic	
  support	
  
Success	
  Stories	
                      “With	
  Solr,	
  we	
  really	
  treat	
  it	
  
                                                                                                              through	
  ‘Did	
  you	
  
       •   Digg	
                         as	
  kind	
  of	
  a	
  platform	
  where	
  
           Myspace	
                                                                                          mean"	
  functionality	
  	
  
       •                                  we	
  can	
  build	
  other	
  kind	
  of	
                        Highlighting	
  keywords	
  
       •   LinkedIn	
  
       •   Reddit	
                       things	
  on	
  top	
  of	
  it…	
  We	
  have	
             •      Deeper	
  drill	
  down	
  
       •   Technorati	
                   a	
  very	
  valuable	
  set	
  of	
  data,	
                       with	
  faceting	
  
       •   Scout	
  Labs	
                and	
  we	
  really	
  want	
  to	
  
       •   Xmarks.com	
                                                                                •      Real-­‐time	
  content	
  
                                          explore	
  new	
  ways	
  of	
                                      updating	
  
                                          building	
  new	
  features	
  from	
  
                                                                                                       	
  
                                          that	
  data	
  set.”	
  
                                                     —Sammy	
  Yu,	
  Digg.com	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                          Page 18
                                                                              	
                                                                          	
  
	
                                                                               	
                                                                          	
  
	
                                                                               	
                                                                          	
  
	
                                                                               	
                                                                          	
  
	
                                                                               	
                                                                          	
  
	
                                                                               	
                                                                          	
  
	
                                                                               	
                                                                          	
  
	
                                                                               	
                                                                          	
           	
  



	
  




                                                                                                                                            	
  
Case	
  Study	
  6	
  

       Digg.com	
  
       Digg	
  displays	
  the	
  wisdom	
  of	
  the	
  crowds.	
  By	
  leveraging	
  the	
  mass	
  collaboration	
  of	
  readers	
  distributed	
  across	
  the	
  
       Internet—everything	
  on	
  Digg	
  is	
  submitted	
  by	
  the	
  public	
  community	
  for	
  the	
  public	
  community—it	
  builds	
  on	
  the	
  easy	
  
       findability	
  of	
  information	
  valued	
  by	
  the	
  marketplace	
  of	
  readers	
  and	
  consumers.	
  	
  
       Digg	
  realized	
  early	
  on	
  that	
  to	
  succeed	
  in	
  the	
  business	
  of	
  information,	
  they	
  needed	
  to	
  make	
  information	
  available	
  to	
  
       their	
  audience	
  as	
  effortlessly	
  as	
  possible.	
  They	
  saw	
  the	
  following	
  challenges	
  as	
  roadblocks	
  for	
  implementing	
  a	
  base	
  
       search	
  application:	
  
       Requirements	
  
             •     Managing	
  unstructured	
  data	
  (13	
  million	
  documents	
  and	
  growing)	
  in	
  real	
  time	
  
             •     Providing	
  results	
  faster	
  
             •     Facilitating	
  smart	
  navigation	
  to	
  provide	
  information	
  in	
  digestible	
  portions	
  
             •     Recognizing	
  and	
  eliminating	
  duplicate	
  content	
  
             •     Providing	
  semantic	
  and	
  linguistic	
  smart	
  application	
  
             •     Facilitating	
  scalability	
  while	
  retaining	
  costs	
  	
  
                       	
  
       Digg	
  selected	
  Solr	
  for	
  its	
  unmatched	
  flexibility	
  and	
  functionality.	
  
       The	
  Solr	
  Solution	
  
             •     Highly	
  customizable	
  and	
  flexible	
  
             •     Results	
  in	
  subseconds,	
  with	
  simple-­‐to-­‐use	
  pull	
  downs	
  to	
  refine	
  results	
  
             •     Fuzzy	
  duplicate	
  detection	
  (by	
  coding)	
  
             •     Unlimited	
  scalability	
  and	
  seamless	
  integration	
  with	
  the	
  heterogeneous	
  environment	
  



The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                           Page 19
                                                                            	
                                                                          	
  
	
                                                                             	
                                                                          	
  
	
                                                                             	
                                                                          	
  
	
                                                                             	
                                                                          	
  
	
                                                                             	
                                                                          	
  
	
                                                                             	
                                                                          	
  
	
                                                                             	
                                                                          	
  
	
                                                                             	
                                                                          	
       	
  




                                                                                                                                         	
  
Case	
  Study	
  7	
  
	
   LinkedIn	
  
	
   Connecting	
  50	
  million	
  registered	
  users	
  from	
  200	
  countries	
  across	
  170	
  industries	
  and	
  matching	
  them	
  to	
  
	
   the	
  right	
  professional	
  contacts	
  is	
  what	
  LinkedIn	
  is	
  all	
  about.	
  LinkedIn’s	
  business	
  is	
  premised	
  on	
  
       intelligent	
  search	
  application	
  that	
  could	
  overcome	
  the	
  following:	
  	
  
	
   The	
  Challenges	
  
	
       • Managing	
  an	
  ever-­‐growing	
  database,	
  with	
  one	
  new	
  member	
  joining	
  and	
  creating	
  a	
  profile	
  every	
  
	
               second	
  
             •   Indexing	
  unstructured	
  data	
  in	
  real	
  time	
  
	
           •   Giving	
  instant	
  query	
  responses,	
  even	
  in	
  peak	
  traffic	
  hours	
  
             •   Providing	
  intuitive	
  navigation	
  and	
  intelligent	
  linguistic	
  support	
  
	
           •   Integrating	
  with	
  other	
  Web	
  2.0	
  tools	
  to	
  build	
  user	
  profiles	
  that	
  integrate	
  data	
  from	
  multiple	
  
	
               sources	
  
       They	
  chose	
  Lucene	
  to	
  implement	
  the	
  search	
  function	
  at	
  the	
  core	
  of	
  their	
  business	
  model.	
  	
  
	
   The	
  Lucene	
  Solution	
  

	
           •     Used	
  index	
  segmentation	
  for	
  faster	
  results	
  and	
  to	
  limit	
  index	
  base	
  
             •     Provided	
  faceted	
  search	
  and	
  intelligence	
  support	
  features	
  like	
  changing	
  the	
  view	
  of	
  search	
  
                   results	
  and	
  auto-­‐completion	
  of	
  contacts	
  	
  
             •     Calculated	
  relative	
  relevance,	
  ranking	
  results	
  on	
  the	
  fly	
  based	
  on	
  relationship	
  between	
  the	
  user’s	
  
                   profile	
  and	
  the	
  other	
  profiles	
  being	
  searched	
  	
  
             •     Integrated	
  with	
  the	
  latest	
  web	
  tools;	
  for	
  example,	
  incorporating	
  videos	
  in	
  search	
  results	
  
             •     Provided	
  "scale	
  as	
  you	
  grow”	
  facility	
  through	
  the	
  flexibility	
  of	
  the	
  open	
  source	
  model	
  	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                         Page 20
                                                                  	
                                                                  	
  
	
                                                                   	
                                                                  	
  
	
                                                                   	
                                                                  	
  
	
                                                                   	
                                                                  	
  
	
                                                                   	
                                                                  	
  
	
                                                                   	
                                                                  	
  
	
                                                                   	
                                                                  	
  
	
                                                                   	
                                                                  	
         	
  



	
  


Enterprise (Intranet) Search
Enterprises	
  today	
  have	
  a	
  global	
  footprint,	
  which	
  leads	
  to	
  the	
  creation	
  of	
     Requirements	
  
multiple	
  content	
  types	
  and	
  the	
  use	
  of	
  disparate	
  applications	
  and	
  content	
  
management	
  systems	
  across	
  business	
  centers.	
  The	
  result	
  is	
  often	
  silos	
  of	
         •   Single	
  interface	
  to	
  
unmanaged	
  data	
  spread	
  across	
  the	
  intranet	
  of	
  an	
  enterprise—a	
  situation	
                  access	
  enterprise	
  
where	
  information	
  is	
  omnipresent	
  but	
  cannot	
  be	
  used.	
                                          data	
  	
  
                                                                                                                 • Faster	
  results	
  	
  
To	
  achieve	
  a	
  competitive	
  advantage,	
  enable	
  intelligent	
  decisionmaking,	
                    • Control	
  over	
  search	
  
eliminate	
  duplication	
  of	
  work,	
  and	
  lower	
  the	
  cost	
  of	
  ownership,	
                         results	
  	
  
enterprises	
  need	
  a	
  search	
  application	
  that	
  gives	
  structure	
  to	
                          • Ready	
  integration	
  
unstructured	
  data;	
  provides	
  a	
  single	
  gateway	
  to	
  search	
  across	
  multiple	
                  with	
  existing	
  
enterprise	
  repositories,	
  with	
  speed,	
  flexibility,	
  and	
  intuitive	
  intelligence.	
  	
             content	
  
Lucene/Solr	
  is	
  a	
  solid	
  match	
  for	
  enterprise	
  search.	
  As	
  a	
  customizable	
  and	
         management	
  
multifunctional	
  search	
  application,	
  Lucene/Solr	
  provides	
  robust	
  search	
                           software	
  
features	
  at	
  minimal	
  cost.	
  The	
  open	
  source	
  development	
  model	
  behind	
                            	
  
Lucene/Solr	
  integrates	
  seamlessly	
  with	
  legacy	
  tools,	
  and	
  brings	
  down	
                   Solr	
  Solution	
  
the	
  total	
  cost	
  of	
  ownership	
  significantly.	
  	
                                                  •      Single	
  gateway	
  for	
  
Given	
  the	
  sensitive	
  nature	
  of	
  enterprise	
  content,	
  Lucene/Solr	
  facilitates	
                     all	
  types	
  of	
  data	
  
document-­‐level,	
  role-­‐based	
  security.	
  And	
  with	
  the	
  transparent	
  search	
                  •      Dynamic	
  boosting	
  
algorithms	
  and	
  configurability	
  for	
  relevancy,	
  Lucene/Solr	
  enables	
                                   of	
  content	
  
intranet	
  search	
  with	
  the	
  precise	
  control	
  enterprise	
  content	
  owners	
                     •      Transparent	
  search	
  
require,	
  ensuring	
  that	
  results	
  consistently	
  deliver	
  the	
  right	
  documents	
  to	
                 algorithms	
  and	
  
the	
  right	
  people.	
                                                                                               relevancy	
  tuning	
  
                                                                                                                 •      Customization	
  and	
  
	
  
                                                                                                                        easy	
  integration	
  
	
                                “The	
  search	
  and	
  discovery	
                                                  with	
  open	
  source	
  
	
  
                                  software	
  market	
  grew	
  19	
                                                    code	
  
                                  percent	
  in	
  2008	
  to	
  $2.1	
  billion”	
                              	
  
	
  
                                                               Sue	
  Feldman,	
  IDC	
                          	
  




                                  M

                                  	
  
The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                       Page 21
                                                                          	
                                                                        	
  
	
                                                                           	
                                                                        	
  
	
                                                                           	
                                                                        	
  
	
                                                                           	
                                                                        	
  
	
                                                                           	
                                                                        	
  
	
                                                                           	
                                                                        	
  
	
                                                                           	
                                                                        	
  
	
                                                                           	
                                                                        	
            	
  




                                                                                                                                                              	
  
Case	
  Study	
  8	
  

       Food	
  and	
  Drug	
  Administration	
  
       The	
  Food	
  and	
  Drug	
  Administration	
  (FDA)	
  is	
  a	
  U.S.	
  government	
  agency	
  responsible	
  for	
  regulating	
  
       and	
  supervising	
  the	
  safety	
  of	
  foods	
  medications,	
  veterinary	
  products,	
  tobacco,	
  and	
  cosmetics.	
  The	
  
       FDA	
  has	
  a	
  large	
  repository	
  of	
  information	
  that	
  dates	
  back	
  multiple	
  decades,	
  and	
  exists	
  in	
  formats	
  
       ranging	
  from	
  early	
  optical	
  character	
  recognition	
  to	
  recent	
  electronic	
  formats.	
  To	
  mine	
  this	
  
       knowledge	
  base,	
  the	
  FDA	
  is	
  developing	
  a	
  semantic	
  mining	
  framework	
  using	
  open	
  source	
  tools	
  such	
  
       as	
  Apache	
  Lucene	
  and	
  Solr.	
  
       Requirements	
  	
  
              •    Integrating	
  petabytes	
  of	
  data	
  highly	
  distributed	
  across	
  the	
  intranet	
  of	
  an	
  enterprise	
  
              •    Managing	
  multiple	
  indices	
  for	
  documents	
  stored	
  in	
  distributed	
  repositories	
  	
  
              •    Managing	
  and	
  maintaining	
  archival	
  data	
  and	
  evolving	
  vocabularies	
  
              •    Indexing	
  unstructured	
  data	
  in	
  real	
  time	
  
              •    Recognizing	
  and	
  eliminating	
  duplicate	
  content	
  
              •    Handling	
  concurrent	
  queries	
  and	
  delivering	
  fast	
  and	
  relevant	
  results	
  
              •    Restricting	
  search	
  results	
  according	
  to	
  agency	
  access	
  control	
  policies	
  	
  
              •    Integrating	
  with	
  existing	
  infrastructure	
  without	
  additional	
  overhead	
  

       The	
  Lucene	
  Solution	
  
              •     A	
  single	
  gateway	
  to	
  search	
  across	
  multiple	
  enterprise	
  repositories	
  
              •     Duplicate	
  detection	
  	
  
              •     Fast	
  and	
  relevant	
  results	
  with	
  content	
  analysis	
  and	
  query	
  interpretation	
  algorithms	
  
              •     Filters	
  results	
  based	
  on	
  access	
  controls	
  and	
  security	
  policies	
  of	
  an	
  enterprise	
  	
  
              •     Facilitates	
  integration	
  with	
  existing	
  enterprise	
  infrastructure	
  to	
  reduce	
  TCO	
  

       	
  
       	
  
The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                      Page 22
                                                                 	
                                                                  	
  
	
                                                                  	
                                                                  	
  
	
                                                                  	
                                                                  	
  
	
                                                                  	
                                                                  	
  
	
                                                                  	
                                                                  	
  
	
                                                                  	
                                                                  	
  
	
                                                                  	
                                                                  	
  
	
                                                                  	
                                                                  	
         	
  




Business Use Case Matrix
To	
  simplify	
  mapping	
  your	
  search	
  needs	
  to	
  existing	
  search	
  applications	
  in	
  the	
  real	
  world,	
  the	
  matrix	
  
below	
  compares	
  business	
  use	
  cases	
  against	
  key	
  search	
  requirements.	
  While	
  not	
  an	
  exhaustive	
  list,	
  
the	
  matrix	
  highlights	
  the	
  different	
  business	
  use	
  cases	
  across	
  sectors	
  and	
  business	
  models,	
  reflecting	
  
the	
  adaptability	
  of	
  Lucene/Solr	
  across	
  the	
  various	
  domains	
  of	
  search	
  applications	
  and	
  use	
  cases.	
  
	
  

                                           Users                           Content                       Content Update Frequency
                                                                                                                                                 Access
         Verticals                             Customer                                                                                          Control
                                Internal                       Original        Aggregated           High           Medium           Low
                                                Facing

Enterprise (Intranet)               √                              √                                                  √                                   √

               Schools/
                                    √              √               √                 √                                √                                   √
               Universities
Education
               Libraries            √              √                                 √                                √                                   √


Job Portals                                        √               √                 √               √


Social Networks                                    √               √                 √               √                                                    √


              News                                 √               √                 √               √
Media
              Media                                √               √                 √               √


E-Commerce Sites                                   √               √                 √               √                √                                   √


Financial Services                  √              √               √                                 √                                                    √


Yellow Pages                                       √                                 √                                √


Horizontal Portals                                 √               √                 √               √

	
  



The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                      Page 23
                                                                       	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
  
	
                                                                        	
                                                                        	
            	
  



Appendix: Lucene/Solr Features and Benefits
Lucene	
  and	
  Solr	
  are	
  complementary	
  technologies	
  that	
  offer	
  very	
  similar	
  underlying	
  capabilities.	
  In	
  
choosing	
  a	
  search	
  solution	
  that	
  is	
  best	
  suited	
  for	
  your	
  requirements,	
  key	
  factors	
  to	
  consider	
  are	
  
application	
  scope,	
  development	
  environment,	
  and	
  software	
  development	
  preferences.	
  	
  
Lucene	
  is	
  a	
  Java	
  technology-­‐based	
  search	
  library	
  that	
  offers	
  speed,	
  relevancy	
  ranking,	
  complete	
  
query	
  capabilities,	
  portability,	
  scalability,	
  and	
  low	
  overhead	
  indexes	
  and	
  rapid	
  incremental	
  
indexing.	
  	
  
Solr	
  is	
  the	
  Lucene	
  Search	
  Server.	
  It	
  presents	
  a	
  web	
  service	
  layer	
  built	
  atop	
  Lucene	
  using	
  the	
  Lucene	
  
search	
  library	
  and	
  extending	
  it	
  to	
  provide	
  application	
  users	
  with	
  a	
  ready-­‐to-­‐use	
  search	
  platform.	
  
Solr	
  brings	
  with	
  it	
  operational	
  and	
  administrative	
  capabilities	
  like	
  web	
  services,	
  faceting,	
  
configurable	
  schema,	
  caching,	
  replication,	
  and	
  administrative	
  tools	
  for	
  configuration,	
  data	
  loading,	
  
statistics,	
  logging,	
  cache	
  management,	
  and	
  more.	
  
Lucene	
  presents	
  a	
  collection	
  of	
  directly	
  callable	
  Java	
  libraries	
  and	
  requires	
  coding	
  and	
  solid	
  
information	
  retrieval	
  experience.	
  Solr	
  extends	
  the	
  capabilities	
  of	
  Lucene	
  to	
  provide	
  an	
  enterprise-­‐
ready	
  search	
  platform,	
  eliminating	
  the	
  need	
  for	
  extensive	
  programming.	
  	
  
Solr	
  provides	
  the	
  starting	
  point	
  for	
  most	
  developers	
  who	
  are	
  building	
  a	
  Lucene-­‐based	
  search	
  
application.	
  It	
  comes	
  ready	
  to	
  run	
  in	
  a	
  servlet	
  container	
  such	
  as	
  Tomcat	
  or	
  Jetty,	
  making	
  it	
  ready	
  to	
  
scale	
  in	
  a	
  production	
  Java	
  environment.	
  	
  
With	
  convenient	
  ReST-­‐like/web-­‐service	
  interfaces	
  callable	
  over	
  HTTP,	
  and	
  transparent	
  XML-­‐based	
  
configuration	
  files,	
  Solr	
  can	
  greatly	
  accelerate	
  application	
  development	
  and	
  maintenance.	
  In	
  fact,	
  
Lucene	
  programmers	
  have	
  often	
  reported	
  that	
  they	
  find	
  Solr	
  contains	
  “the	
  same	
  features	
  I	
  was	
  
going	
  to	
  build	
  myself	
  as	
  a	
  framework	
  for	
  Lucene,	
  but	
  already	
  very	
  well	
  implemented.”	
  Using	
  Solr,	
  
enterprises	
  can	
  customize	
  the	
  search	
  application	
  according	
  to	
  their	
  requirements,	
  without	
  
involving	
  the	
  cost	
  and	
  risk	
  of	
  writing	
  the	
  code	
  from	
  the	
  scratch.	
  
Lucene	
  provides	
  greater	
  control	
  of	
  your	
  source	
  code	
  and	
  works	
  best	
  in	
  development	
  environments	
  
where	
  resources	
  need	
  to	
  be	
  controlled	
  exclusively	
  by	
  Java	
  API	
  calls.	
  It	
  works	
  best	
  when	
  
constructing	
  and	
  embedding	
  a	
  state-­‐of-­‐the-­‐art	
  search	
  engine,	
  allowing	
  programmers	
  to	
  assemble	
  
and	
  compile	
  inside	
  a	
  native	
  Java	
  application.	
  While	
  working	
  with	
  Lucene,	
  programmers	
  can	
  directly	
  
control	
  the	
  large	
  set	
  of	
  sophisticated	
  features	
  with	
  low-­‐level	
  access,	
  data,	
  or	
  state	
  manipulation.	
  	
  
Enterprises	
  that	
  do	
  not	
  require	
  strict	
  control	
  of	
  low-­‐level	
  Java	
  libraries	
  generally	
  prefer	
  Solr,	
  as	
  it	
  
provides	
  ease	
  of	
  use	
  and	
  scalable	
  search	
  power	
  out	
  of	
  the	
  box.	
  	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                  Page 24
                                                                     	
                                                                       	
  
	
                                                                      	
                                                                       	
  
	
                                                                      	
                                                                       	
  
	
                                                                      	
                                                                       	
  
	
                                                                      	
                                                                       	
  
	
                                                                      	
                                                                       	
  
	
                                                                      	
                                                                       	
  
	
                                                                      	
                                                                       	
           	
  



	
  
As	
  functional	
  siblings,	
  Lucene	
  and	
  Solr	
  have	
  become	
  popular	
  alternatives	
  for	
  search	
  applications;	
  
the	
  two	
  differ	
  mainly	
  in	
  the	
  style	
  of	
  application	
  development	
  used.	
  Key	
  benefits	
  of	
  search	
  with	
  
Lucene/Solr	
  include:	
  	
  
	
  
       •   Search	
  Quality:	
  Speed,	
  Relevance,	
  and	
  Precision	
  Lucene/Solr	
  provides	
  near-­‐real-­‐time	
  
           search	
  and	
  strong	
  relevance	
  ranking	
  to	
  deliver	
  contextually	
  relevant	
  and	
  accurate	
  results	
  
           very	
  quickly.	
  Tailor-­‐made	
  coding	
  for	
  relevancy	
  ranking	
  and	
  sophisticated	
  search	
  
           capabilities	
  like	
  faceted	
  search	
  help	
  users	
  in	
  sorting,	
  organizing,	
  classifying,	
  and	
  structuring	
  
           retrieved	
  information	
  to	
  ensure	
  that	
  search	
  delivers	
  desired	
  results.	
  Search	
  with	
  
           Lucene/Solr	
  also	
  provides	
  proximity	
  operators,	
  wildcards,	
  fielded	
  searching,	
  
           term/field/document	
  weights,	
  find-­‐similar	
  functions,	
  spell	
  checking,	
  multilingual	
  search,	
  
           and	
  much	
  more.	
  	
  
       •   Lower	
  Cost	
  and	
  Greater	
  Flexibility,	
  Plug	
  and	
  Play	
  Architecture	
  Lucene/Solr	
  reduces	
  
           recurring	
  and	
  nonrecurring	
  costs,	
  lowering	
  your	
  TCO.	
  As	
  open	
  source	
  software,	
  it	
  does	
  not	
  
           require	
  purchase	
  of	
  a	
  license	
  and	
  is	
  freely	
  available	
  for	
  use.	
  The	
  open	
  source	
  code	
  can	
  be	
  
           used	
  as	
  is,	
  modified,	
  customized,	
  and	
  updated	
  as	
  appropriate	
  to	
  your	
  needs.	
  Solr	
  is	
  easily	
  
           embedded	
  in	
  your	
  enterprise’s	
  existing	
  infrastructure,	
  reducing	
  costs	
  of	
  installation,	
  
           configuration,	
  and	
  management.	
  	
  
       •   Open	
  Source	
  Platform	
  for	
  Portability	
  and	
  Easy	
  Deployment	
  Because	
  Lucene/Solr	
  is	
  an	
  
           open-­‐source	
  software	
  solution,	
  it	
  is	
  based	
  on	
  open	
  standards	
  and	
  community-­‐driven	
  
           development	
  processes.	
  It	
  is	
  highly	
  portable	
  and	
  can	
  run	
  on	
  any	
  platform	
  that	
  supports	
  Java.	
  
           For	
  instance,	
  you	
  can	
  build	
  an	
  index	
  on	
  Linux	
  and	
  copy	
  it	
  to	
  a	
  Microsoft	
  Windows	
  machine	
  
           and	
  search	
  there.	
  This	
  unsurpassed	
  portability	
  enables	
  you	
  to	
  keep	
  your	
  search	
  application	
  
           and	
  your	
  company’s	
  evolving	
  infrastructure	
  in	
  tandem.	
  Lucene,	
  in	
  turn,	
  has	
  been	
  
           implemented	
  in	
  other	
  environments,	
  including	
  C#,	
  C,	
  Python,	
  and	
  PHP.	
  At	
  deployment	
  time,	
  
           Solr	
  offers	
  very	
  flexible	
  options;	
  it	
  can	
  be	
  easily	
  deployed	
  on	
  a	
  single	
  server	
  as	
  well	
  as	
  on	
  
           distributed,	
  multiserver	
  systems.	
  
       •   Largest	
  Installed	
  Base	
  of	
  Applications,	
  Increasing	
  Customer	
  Base	
  Lucene/Solr	
  is	
  the	
  
           most	
  widely	
  used	
  open	
  source	
  search	
  system	
  and	
  is	
  installed	
  in	
  around	
  4,000	
  organizations	
  
           worldwide.	
  Publicly	
  visible	
  search	
  sites	
  that	
  use	
  Lucene/Solr	
  include	
  CNET,	
  LinkedIn,	
  
           Monster,	
  Digg,	
  Zappos,	
  MySpace,	
  Netflix,	
  and	
  Wikipedia.	
  Lucene/Solr	
  is	
  also	
  in	
  use	
  at	
  Apple,	
  
           HP,	
  IBM,	
  Iron	
  Mountain,	
  and	
  Los	
  Alamos	
  National	
  Laboratories.	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                                                Page 25
                                                             	
                                                              	
  
	
                                                              	
                                                              	
  
	
                                                              	
                                                              	
  
	
                                                              	
                                                              	
  
	
                                                              	
                                                              	
  
	
                                                              	
                                                              	
  
	
                                                              	
                                                              	
  
	
                                                              	
                                                              	
          	
  



       •   Large	
  Developer	
  Base	
  and	
  Adaptability	
  As	
  community	
  developed	
  software,	
  Lucene/Solr	
  
           provides	
  transparent	
  development	
  and	
  easy	
  access	
  to	
  updates	
  and	
  releases.	
  Developers	
  can	
  
           work	
  with	
  open	
  source	
  code	
  and	
  customize	
  the	
  software	
  according	
  to	
  business-­‐specific	
  
           needs	
  and	
  objectives.	
  Its	
  open	
  source	
  paradigm	
  lets	
  Lucene/Solr	
  provide	
  developers	
  with	
  
           the	
  freedom	
  and	
  flexibility	
  to	
  evolve	
  the	
  software	
  with	
  changing	
  requirements,	
  liberating	
  
           them	
  from	
  the	
  constraints	
  of	
  commercial	
  vendors.	
  	
  
       •   Commercial-­Grade	
  Support	
  for	
  Mission	
  Critical	
  Search	
  Applications	
  from	
  Lucid	
  
           Imagination	
  Lucid	
  Imagination	
  provides	
  the	
  expertise,	
  resources,	
  and	
  services	
  that	
  are	
  
           needed	
  to	
  help	
  enterprises	
  deploy	
  and	
  develop	
  Lucene-­‐based	
  search	
  solutions	
  efficiently	
  
           and	
  cost-­‐effectively.	
  Lucid	
  helps	
  enterprises	
  achieve	
  optimal	
  search	
  performance	
  and	
  
           accuracy	
  with	
  its	
  broad	
  range	
  of	
  expertise,	
  which	
  includes	
  indexing	
  and	
  metadata	
  
           management,	
  content	
  analysis,	
  business	
  rule	
  application,	
  and	
  natural	
  language	
  processing.	
  
           Lucid	
  Imagination	
  also	
  offers	
  certified	
  distributions	
  of	
  Lucene	
  and	
  Solr,	
  commercial-­‐grade	
  
           SLA-­‐based	
  support,	
  training,	
  high-­‐level	
  consulting	
  and	
  value-­‐added	
  software	
  extensions	
  to	
  
           enable	
  customers	
  to	
  create	
  powerful	
  and	
  successful	
  search	
  applications.	
  




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010 	
                                                                              Page 26

Open Source Search Applications

  • 1.
                                                                        The  Case  for  Lucene/Solr:     A  Manager’s  Guide     to  Real  World     Open  Source     Search  Applications           By  Lucid  Imagination    
  • 2.
                                                        Abstract   In  today’s  information-­‐driven  environment,  search  is  a  critical  solution  to  problems  when  it  slashes   the  time  and  effort  separating  end  users  from  the  data  they  value.  Search  spans  the  range  of   business  models  and  use  cases—from  driving  direct  customer  sales,  to  analytics  and  business   intelligence,  employee  productivity,  and  reduced  administrative  overhead.  Making  the  best  use  of   search  requires  two  perspectives:  both  a  look  at  the  business  requirements  for  a  search  application   and  a  view  to  new  business  opportunities  created  by  using  search  to  leverage  the  organization’s   content  resources.       Thousands  of  organizations  across  different  sectors  and  business  models  have  harnessed  Apache   Lucene/Solr  to  search  their  rapidly  growing  and  diversifying  content  resources.  Underlying  this   broad  adoption  is  the  extraordinary  power,  scalability,  and  versatility  of  open  source  search   technologies.       This  paper  provides  an  overview  of  both  the  requirements  and  the  opportunities  for  search   applications.  It  then  explores  how  real  world  organizations  are  successfully  using  Lucene/Solr   search  applications  to  meet  those  opportunities,  presenting  how  the  technology  is  used  for  specific   business  models  and  use  cases  across  industries.  In  addition,  it  offers  a  baseline  for  setting  search   requirements  that  managers  and  architects  can  use  to  adopt  Lucene/Solr,  and  adapt  this  open   source  search  technology  to  the  unique  needs  of  their  business.                       ©  2010,  Lucid  Imagination   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page ii
  • 3.
                                                        Table  of  Contents   Introduction ............................................................................................................................................................... 1   Understanding  Search  Opportunities  and  Requirements ...................................................................... 2   What  Data  and  Documents  Are  You  Searching? ................................................................................ 3   Who  Needs  the  Results  and  Why? ........................................................................................................... 3   Where  Is  Search  Integrated  with  IT  Infrastructure? ....................................................................... 5   How  Is  the  Search  Interface  Presented  to  the  User?........................................................................ 5   The  Real  World:  Applications  and  Case  Studies ......................................................................................... 7   Yellow  Pages,  Local  Search,  and  Searching  Classifieds........................................................................ 8   Media .......................................................................................................................................................................10   E-­‐commerce..........................................................................................................................................................12   Job  and  Career  Sites ..........................................................................................................................................14   Libraries,  Archives,  and  Museums  (LAMs)  Search ..............................................................................16   Social  Media  Search...........................................................................................................................................18   Enterprise  (Intranet)  Search.........................................................................................................................21   Business  Use  Case  Matrix ...................................................................................................................................23   Appendix:  Lucene/Solr  Features  and  Benefits..........................................................................................24     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page iii
  • 4.
                                                      Introduction As  fast  as  companies,  communities,  and  consumers  produce  data—about  each  other,  products,   opinions,  research,  and  everything  else  imaginable—they  need  faster,  more  versatile  search   capabilities  to  find  the  information  they  need  to  create  opportunities  for  competitive  advantage.  In   today’s  information-­‐driven  environment,  search  addresses  the  critical  problems  created  by  the   explosive  growth  of  content  by  slashing  the  time  and  effort  users  expend  in  finding  data  they  value.   Search  spans  the  range  of  business  models  and  use  cases:  from  driving  direct  customer  sales,  to   analytics  and  business  intelligence,  employee  productivity,  and  reduced  administrative  overhead.     Apache  Lucene/Solr1  open  source  search  technology  has  been  implemented  across  the  broadest   range  of  applications  and  business  models—and  likely  in  ways  that  can  fit  the  needs  of  your   organization.  In  successful  operation  today  at  thousands  of  enterprises,  Lucene/Solr  technology   scales  from  tens  of  thousands  to  hundreds  and  billions  of  documents;  searches  data  that  is   structured,  unstructured,  and  in  combination;  data  inside  and  outside  the  firewall;  and  ranges  in   use  from  a  simple  website  search  box  through  sophisticated  faceted  navigation.  It  addresses  equally   diverse  business  processes  and  mission  critical  applications.  Across  the  spectrum,  Lucene/Solr   helps  users  find,  make  sense  of,  and  act  upon  information  quickly  and  efficiently.   In  this  white  paper,  we’ll  review  real-­‐world  case  studies  for  Lucene/Solr  functionality  across   business  sectors  to  demonstrate  its  versatility  and  varied  applicability.  The  diversity  of  examples   provides  strong  evidence  of  Lucene/Solr’s  flexibility  and  power  as  a  search  technology.  The   examples  also  attest  to  the  innovation  and  transparency  inherent  to  the  open  source  development   model.  Our  focus  is  on  familiarizing  the  audience  of  business  managers  and  application  owners  with   existing  Lucene/Solr  applications;  the  substantial  technical  advantages  to  developers  are  covered   elsewhere.                                                                                                                     1 Lucene and Solr are complementary technologies that offer very similar underlying capabilities; Solr is the Lucene Search Server. Since Lucene serves as the core of Solr’s search capabilities, this paper refers to the two as Lucene/Solr. For more information, see the Appendix. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 1
  • 5.
                                                      We’ll  first  survey  the  key  requirements  and  business  use  cases  of  search  and  then  look  at  where   they  are  built  into  search  applications.  Our  objective  is  to  provide  business  managers  and   application  owners  with  a  broad  perspective  on  how  Lucene/Solr  search  technology  is  used  to  build   solutions  to  compelling  business  problems.  In  the  Appendix,  we  provide  an  overview  of   Lucene/Solr’s  key  features  and  benefits,  with  a  basic  outline  of  the  capabilities  offered  to  meet  the   broadest  range  of  business  needs.     Understanding Search Opportunities and Requirements Search  technology  has  come  a  long  way  from  its  roots  in  matching  keywords  with  appearance  in   documents  and  obtaining  undifferentiated  results.  Search  today  empowers  users  by  delivering   actionable  information  quickly  and  efficiently,  across  multiple,  diverse  sources  of  data.  The   business  use  cases  range  from  executing  mission  critical  commercial  transactions  (e.g.,  e-­‐commerce   sites)  to  unlocking  employee  and  end-­‐user  productivity  in  the  search  for  a  single  relevant  document   (e.g.,  enterprise  search).     Given  the  breadth  of  capability  of  the  problem  domain,  it’s  useful  to  look  at  search  and  ask  two   fundamental  questions:  “How  it  can  it  solve  my  business  problems?”  and  “What  new  business   opportunities  can  search  solve  for?”   In  considering  how  search  technology  solves  business  problems,  it  is  useful  to  start  with  an   elucidation  of  the  requirements  you’ll  need  to  consider  for  your  search  application.  At  the  same   time,  be  sure  to  look  more  broadly  at  the  capabilities  that  Lucene/Solr  offers,  as  it  can  help  open  up   new  frontiers  for  incorporating  search  and  leveraging  more  value  from  data  repositories.     Starting  with  some  basic  questions—what,  who,  how,  and  where—you  can  clarify  the  high-­‐level   business  requirements  specific  to  your  business  needs,  which  in  turn  allow  you  to  make  the  best   decisions  for  your  search  application.  The  process  of  looking  at  the  fundamentals  also  raises  new   questions  about  how  and  where  the  search  technology  offered  by  Lucene  and  Solr  can  create  new   business  opportunities.   Let’s  look  at  four  fundamental  questions  you  should  address  in  understanding  search  opportunities   and  requirements:   • What  data  and  documents  are  you  searching?     • Who  needs  the  results  and  why?     • Where  is  search  integrated  with  IT  Infrastructure?         • How  is  the  search  interface  presented  to  the  user?     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 2
  • 6.
                                                      What Data and Documents Are You Searching? Business  today  is  driven  more  than  ever  by  the  end-­‐users’  creation  and  consumption  of  real-­‐time   information.  A  key  differentiating  capability  of  search  technology  is  ingesting  a  broad  range  of   content  types  and  processing  large  collections  of  diverse  data  in  real  time  in  order  to  deliver   actionable  information.  Two  aspects  to  consider:   • Types  of  Content   Content  comes  in  multiple  formats:  HTML  pages,  XML  files,  PDFs,  images,  PowerPoint   presentations,  Excel  spreadsheets,  Word  documents,  log  files,  multimedia  content,  and   more.  Content  resides  in  various  repositories,  including  databases,  file  servers,  content   management  systems,  archiving  systems,  collaboration  applications,  and  employee   desktops  and  laptops.  Search  technology  must  be  able  to  locate,  organize,  and  aggregate   data  whatever  its  form  or  location.     • Frequency  of  Updating  Content   Organizations  update  content  at  varying  intervals,  driven  by  differing  business  processes   and  models—social  media  or  news  applications  have  real-­‐time  content  need,  whereas  an  e-­‐ commerce  application  might  re-­‐index  in  response  to  new  inventory  on  a  batch  basis  and  a   research  institution  might  add  to  its  collection  less  often  still.  Search  applications  need  to  be   adaptable  to  the  differences  in  content  change  frequency.   Who Needs the Results and Why? Business  search  puts  a  high  priority  on  end  user  experience  and  results  in  which  the  searched   content  is  tuned  to  the  unique  needs  of  each  user.  Because,  after  all,  the  human  dimension—the   usefulness  of  results  and  the  efficacy  of  interaction—is  the  acid  test  of  a  search  application.  Internet   search  applications  like  Google,  Yahoo,  and  Bing  are  now  common  and  mature.  They  have  raised   user  expectations  about  key  qualities  of  the  search  experience...but  they  solve  a  very  different   problem.     While  Internet  searches  can  produce  millions  of  results  in  milliseconds,  they  rely  on  measures  like   website  popularity  or  URLs  and  domain  names—not  relevant  and  not  generally  applicable  to   purpose-­‐built  applications  for  businesses.  What’s  more,  they  rely  on  generalizing  relevancy  for  a   global  population  of  all  Internet  users,  without  being  tied  to  business  rules,  or  business  process   logic,  or  the  opportunity  cost  of  improved  precision  for  a  specific  set  of  data  or  search  users.   Business  search  applications  cannot  rely  on  such  brute  force  coarse  approaches  to  tune  their   results.  They  need  far  more  control  and  precision.  They  have  to  be  able  to  deliver  highly  useful   results  while  matching,  if  not  exceeding,  the  levels  of  user  experience  that  people  have  come  to   expect  by  virtue  of  their  daily  interactions  with  commercial  search  engines.  Key  points  of   consideration  from  a  business  perspective  are:   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 3
  • 7.
                                                      • Relevance   Relevance  is  entirely  a  factor  of  the  goals  of  the  search  application’s  users.  The  application   must  have  the  mechanisms  to  recognize  the  subjective  needs  of  users  and  tune  results   accordingly.  It  must  also  provide  easier  ways  to  narrow  search  criteria  without  requiring   users  to  come  up  with  perfect  query  terms.  Flexibility  for  drilling  deeper  will  make  results   richer  and  valuable.  Mechanisms  to  apply  filters,  proximity  values,  and  sorting  parameters   to  narrow  search  scope  can  also  lead  to  a  richer  set  of  more  useful  results,  with  less  time   and  effort.   • Cost  of  Relevance     As  business  goals  are  driven  by  revenue  opportunities  and  cost  savings,  it  is  critical  to  tie   relevance  to  the  economics  of  the  business.  For  example,  a  public-­‐facing  retail  site  should   focus  on  matching  merchandise  to  search,  site  stickiness,  and  customer  loyalty.  It  requires   search  technology  that  streamlines  and  simplifies  the  shopping  experience  with  relevant   results  directly  contributing  to  sales  revenue.  For  knowledge  workers,  internal  search   applications  should  help  make  employees  more  productive  by  reducing  the  amount  of  time   and  effort  to  find  documents  they  need  to  do  their  jobs.  Multiple  studies  show  that   information  workers  can  spend  20–30%  of  their  time  searching  for  information.   • Precision  Ranking   Result  accuracy,  sorted  by  attributes  like  relevance,  date,  field,  or  any  document  property   feature,  makes  the  search  process  better.  End  users  generally  abandon  a  search  before   tackling  the  fine  points  of  Boolean  logic  or  scrolling  for  a  result  buried  too  far  down.     • Query  Response  Speed   Today,  5–7  seconds  is  the  typical  threshold  for  end-­‐user  patience.  Too  much  wait  time  for   search  results  frustrates  users,  and  causes  them  to  abandon  pages.  Fast,  relevant  results   cannot  be  limited  by  search  technology  hamstrung  by  data  influx  or  query  overload.  Query   response  time  should  also  work  hand-­‐in-­‐hand  with  the  refinement  of  multiple  search   attributes,  so  that  increasingly  complex  queries  do  not  extract  a  performance  penalty.   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 4
  • 8.
                                                      Where Is Search Integrated with IT Infrastructure? Useful,  valuable  search  technology  rarely  exists  in  isolation.  Searched  data  is  transformed  into   actionable  information  when  it  is  integrated  with  the  organization’s  information  infrastructure:   business  process  to  business  intelligence  to  content  management  systems.  A  robust  search   technology  must  be  customizable  to  integrate  with  the  existing  systems  seamlessly.     • Application  Integration   A  key  requirement  for  a  search  application  is  its  extensibility  for  integration  with  existing   infrastructure  and  applications  like  content  management  systems,  databases,  and  the  full   range  of  business  processes  and  applications.  It  should  have  interfaces  that  support   ingestion  of  data  as  well  as  delivery  of  results  in  readily  consumable  formats—because  in   many  cases,  results  are  consumed  by  other  applications,  not  a  human.   • Scalability   We  can  assume  that  data  will  change  and  grow.  So  scalability  is  a  key  factor  for  search   application.  Applications  should  grow  to  address  future  needs  without  penalties  for  the   breadth  of  data  or  for  the  count  of  documents  indexed.  The  search  application  should  be   able  to  grow  with  the  requirements  of  the  organization,  without  needing  additional  large   investments  in  hardware  to  match  the  pace  of  growth.  Proprietary  search  vendors  often   charge  for  search  by  the  number  of  documents  indexed.  In  a  world  where  constantly   expanding  content  growth  is  the  norm,  such  costs  can  be  a  real  and  substantial  drag  on   the  cost  of  ownership  for  search  applications,  many  times  resulting  in  negative  return.     • Security   Every  organization  has  its  own  security  requirements  and  access  controls.  Search   technologies  need  to  comply  with  the  security  policies  of  the  enterprise,  controlling   results  that  have  restricted  access.  The  search  technology  should  also  be  able  to  make  use   of  document-­‐level  security  from  other  sources.     How Is the Search Interface Presented to the User? The  user  interface  is  where  search  delivers  on  findability  and  presents  actionable  results.  The   search  application  is  only  as  good  as  the  convenience  of  submitting  queries,  reviewing  and  refining   results,  and  finding  information.  Key  aspects  to  consider:     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 5
  • 9.
                                                      • Navigation   Users  benefit  from  guidance  that  makes  their  queries  more  productive.  Techniques  such  as   faceted  search  with  result  clustering,  advance  hinting  (“did  you  mean”),  “more  like  this,”   and  drop  down  menus  for  setting  search  scope  help  users  achieve  desired  results  faster,   making  a  search  application  both  user-­‐  and  information-­‐friendly.  It  is  also  important  to   allow  users  to  draw  associative  connections  between  results—using  the  technology  to   uncover  relationships  and  discover  more  about  what  they  were  seeking  than  they  knew  at   the  outset.     The  NetFlix  search   application  is  powered   by  Solr;  it  adds  the  fuzzy   dimension  to  search,   with  auto-­completion  of   movie  names,  correction   of  misspelled  names  of   actors,  and  suggests   titles  closest  to  the   query.  As  a  result,  85%   of  users  have  found  the   movie  they  were  looking   for  ranked  at  the  #1  spot   in  the  results.         • Discovery   Search  application  functionality  should  extend  beyond  the  generic  presentation  of  a  result   list  of  documents  that  contain  a  keyword.  Highlighting  keywords  in  searched  results,   expanding  searches  with  synonyms  and  spell  checking,  and  offering  users  ways  to  learn  a   bit  more  about  documents  in  the  results  without  having  to  load  the  document  are  great   ways  to  significantly  improve  usability.       • Intuitive  Intelligence   Search  applications  must  go  beyond  keyword  search  to  help  users  retrieve  accurate   information  even  when  they  are  not  sure  of  the  best  keywords.  Additionally,  they  should   reduce  misinterpretations  where  homonyms,  spelling  errors,  and  ambiguous  keywords  are   involved  (e.g.,  is  “apple”  a  fruit  or  a  computer  company?).   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 6
  • 10.
                                                      The Real World: Applications and Case Studies With  an  understanding  of  the  fundamentals  of  search  business  applications  in  hand,  it  is   helpful  to  gain  additional  context  on  business  usage  through  a  survey  of  organizations  that   have  successfully  used  Lucene/Solr  for  powerful  search  applications.     All  of  these  cases  were  built  on  the  capability  of  Lucene/Solr  to  provide  innovative,  high-­‐ performance,  cross-­‐platform,  feature-­‐rich  search  technology  suitable  for  nearly  every   application.  By  powering  diverse  search  applications  for  thousands  of  organizations  such   as  AT&T,  Zappos,  McClatchy,  Smithsonian,  MTV  Networks,  LinkedIn,  MySpace,  Comcast,   Monster,  Netflix,  and  many  more,  Lucene/Solr  has  provided  mission  critical  capability  that   turns  search  into  a  robust  competitive  advantage.     For  these  organizations,  Lucene/Solr  solutions  regularly  index  and  search  hundreds  of   millions  of  documents  with  subsecond  response  time,  unencumbered  by  costly  licensing  or   vendor  lock-­‐in.  Together  they  represent  a  compelling  argument  for  the  broad  applicability   of  Lucene/Solr  across  the  full  range  of  business  opportunities  and  search  needs.  Business   use  case  studies  we’ll  review  include:   • Yellow  Pages,  Local  Search,  and  Searching  Classifieds   • Media   • E-­‐commerce     • Job  and  Career  Sites     • Libraries,  Archives,  and  Museums  (LAMs)  Search     • Social  Media  Search     • Enterprise  (Intranet)  Search     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 7
  • 11.
                                                      Yellow Pages, Local Search, and Searching Requirements     Classifieds In  the  business  of  online  local  search,  geographic-­‐based  (location)   • Intelligent  results  going   beyond  keyword  search   relevance  generates  competitive  advantage.  Online  directories   need  to  provide  a  rich,  interactive  search  experience  to  users  to   • Deeper,  faceted   increase  site  views  and  stickiness,  which  in  turn  translates  into   navigation   increased  advertising  revenue.  Simplified  location-­‐based  search,   • Seamless  integration   with  latest  Web  2.0   intuitive  faceted  query  response,  and  data  mashups  are  a  few   features  that  define  search  functionality  for  an  online  directory.   tools   • Lower  IT-­‐related  costs   Lucene/Solr  solutions  offer  accurate  search  results,  factoring  in   • Geocentric  user   location,  users’  reviews,  and  ratings,  alongside  paid  advertising.  By   experience   taking  advantage  of  Solr’s  open  source  model—with  search   • Search  numeric  values   algorithms  that  are  completely  transparent—companies  can  invest     in  configuring  their  search  solutions  to  match  their  business  logic,   Solr  Solution   rather  than  trying  to  infer  or  pay  for  exposure  proprietary  back-­‐ end  logic.     • Customizable  Search   Index  which  can  be     tuned  transparently  to     Internet  Yellow  pages  and  local   account  for  key     online  search  is  forecast  to   findability  drivers   • Drop  down  filters  for   grow  to  $27.8  billion  in  2011.     narrowing  or  widening     The  Kelsey  Report1   the  scope  of  search   • Seamless  integration   Success  Stories   with  existing   technologies   • YP.com,  a  division  of  AT&T  Interactive   • Native  numeric   • Zvents.com,  local  event  search  service     encoding  and  search   • Yelp.com,  the  community  local  search  site   capabilities     M • Reduced  server     footprint  for  lower  TCO     than  most  commercial     vendors         1The  Kelsey  Group’s  Global  Print  Yellow  Pages,  Internet  Yellow  Pages  and  Local  Search  Five     Year  Outlook   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 8
  • 12.
                                                                Case  Study  1     yp.com  by  AT&T  Interactive       AT&T  Interactive  is  an  online  and  mobile  search  and  advertising  company.  Their  leading-­‐edge  portal,  yp.com—an     online  business  listing  and  advertising  site—was  originally  implemented  with  a  commercial  proprietary  search     application.  It  faced  issues  of  scalability,  vendor  lock-­‐in,  and  performance.  With  help  from  Lucid  Imagination,  AT&T   successfully  migrated  to  a  Solr-­‐based  search  solution  that  leveraged  the  flexibility  of  open  source  without   compromising  features  and  functionality.    And  they  did  so  with  a  much  smaller  budget.     Business  Needs   • Addressing  the  need  to  factor  in  location  to  support  geographic  search,  and  include  relevant  comments   • Striking  a  balance  between  organic  search  and  advertised  content   • Indexing  highly  unstructured  content  such  as  user  comments     • Increasing  relevancy  of  results  and  boosting  paid  search  results  for  preferential  placement  of  advertisers   • Linguistic  support  to  enable  search  experience,  such  as  spellchecking,  synonyms,  find-­‐similar,  etc.   • Integrating  with  latest  Web  2.0  tools   • Reducing  server  footprint     The  Solr  Solution     • Context-­‐specific  relevancy,  geographic  proximity,  ad  placement,  and  user  comments   • Faceting,  drop  down  filters  to  narrow/widen  the  scope  of  search     • Functional  support  for  creating  new  features     • Spell-­‐correction,  and  location-­‐optimized  search  results  to  show  users  businesses  nearest  to  them  first   • Seamless  integration  with  many  Web  2.0  tools  to  create  innovative  features  and  mashups   • Lowers  TCO  by  reducing  the  number  of  search  servers  from  120  to  two  dozen  servers     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 9
  • 13.
                                                        Media Brand  reinforcement,  premium  content,  and  easy  accessibility   are  the  main  business  motivators  for  online  media  and   Requirements   publishing  companies.  Relevant  information  improves  time  on   • Real-­‐time  indexing  of   the  site  and  encourages  users  to  explore  related  content,   petabytes  of  structured   boosting  subscription  rates  and  site  views.  These  translate  into  a   and  unstructured  data     virtuous  cycle  of  additional  revenue  generation.   • Deeper  search  capability   • Improved  query   Given  that  content  is  the  business,  the  need  for  a  robust  search   response  time   application  ties  directly  to  competitive  advantage.     • Reduced    infrastructure   Lucene/Solr  provides  a  customized,  function  rich  solution  for  the   and  customization  costs   media  and  publishing  industry.  It  addresses  dynamic  challenges     of  content  diversity,  content  freshness,  and  content  acquisition  ,   Solr  Solution   and  gives  companies  a  platform  on  which    to  build  a  world-­‐class   • Reverse  indexing   innovative  search  experience  to  differentiate  themselves  in  a   • Intelligent,  faceted  search   highly  competitive  marketplace.     to  enable  contextual  and   linguistic  relevance     • Easy  configuration  for     “Solr  has  done  wonders  for  us.   parsing  structured  and     It  is  easy  to  understand  and   unstructured  data   deploy,  and  has  reduced  our   • Easy  and  seamless     installation  for  lower   costs  drastically.”   TCO       Doug  Steigerwald,   • Customization  with  open   source  code      McClatchy  Interactive           Success  Stories   • McClatchy  Newspapers   • Netflix     • Comcast  Interactive   • MTV  Networks,  a  division  of  Viacom   M • The  Motley  Fool,  fool.com     • Fanfeedr.com,  personalized  sports  aggregator     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 10
  • 14.
                                                          Case  Study  2     McClatchy—Leading  Newspaper  Publisher   The  third  largest  newspaper  publisher  in  the  United  States,  McClatchy  Company  owns  30  daily   newspapers  in  29  markets  across  the  country.  To  win  online,  McClatchy  knew  it  had  to  have  a  robust   search  solution,  to  empower  the  McClatchy  audience  with  the  information  they  wanted  and  secure   loyalty  from  readers  and  sponsorships  from  advertisers.  Working  with  Lucid  Imagination,  McClatchy   migrated  from  proprietary  search  software  to  open  source  and  chose  Solr  for  its  high  performance,   comprehensive  capabilities,  and  superior  value     Requirements   • Proliferating  content  and  data  sources  (text,  videos,  audios,  images),  with  real-­‐time   streaming     • Empowering  end  users  with  ease  of  use   • Supporting  peak  traffic  and  popular  search  spikes  with  consistent  performance   • Providing  scalability  for  a  database  growing  by  orders  of  magnitude  annually   • Providing  flexibility  to  support  customization   • Controlling  IT  costs  while  exceeding  performance  benchmarks  of  competition     The  Lucene/Solr  Solution     • Deeper  content  by  indexing  both  structured  and  unstructured  data  in  real  time,  effortlessly   • Indexes  millions  of  documents,  with  search  results  delivered  in  milliseconds     • User-­‐friendly  navigation  with  drop  down  filters,  faceted  navigation,  linguistic  corrections,   etc.       • Excellent  performance,  even  in  peak  hours,  by  load-­‐balancing  search  requests  across  servers     • Scalability  without  impact  on  performance     • High  degree  of  customization,  since  it’s  open  source   • Integration  with  existing  IT  infrastructure  and  eliminates  associated  license  fees  to  cut  costs   • 8-­‐fold  reduction  in  server  footprint     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 11
  • 15.
                                                      E-commerce     E-­‐commerce  businesses  must  provide  a  compelling  shopping  experience   Requirements   in  order  to  maintain  brand  equity  and  thrive  in  a  very  highly  competitive   • Multidimensional,   market  landscape.  By  reducing  the  time  and  effort  required  to  navigate   dynamic  search   available  merchandise  and  find  what  they  want,  superior  search   • Faster  results   contributes  directly  to  a  satisfying  buying  experience  for  customers.   • Real-­‐time  indexing   Search  then  translates  directly  into  higher  revenues  and  customer   of  products   loyalty.  Instant  results,  intuitively  organized,  advanced  faceting  for  easy   • Faceting  and   browsing,  synchronizing  results  with  images,  and  integration  with  user   browsing   ratings  are  among  the  must  have  features  of  an  e-­‐commerce  search   capabilities   application.   • Seamless   Lucene/Solr  gives  companies  the  ability  to  build  their  sites  around  the   integration  with   concept  of  “searchendizing”—putting  the  desired  merchandise  at  the  top   existing  IT   of  the  results  list—which  can  make  the  difference  between  sales  made   infrastructure   and  sales  lost.  Faceting,  database  integration,  real-­‐time  indexing,  and     query  monitoring  all  enable  users  to  find  products  they  want,  driving   Solr  Solution   conversion  rates  and  enabling  a  winning  online  experience.  2     • Faceted  search  for     deeper  drill  down     Online  retail  sales  in  the   and  browsing     B2C  market  are  expected   • Intuitive  search     capabilities  for   Success  Stories   to  reach  $340  billion  by   cross-­‐channel   201321   shopping   • Buy.com   • Sears.com     experience     Forrester  Research   • System   • Macys.com   administration  tools   • Zappos.com   for  data  loading,   • Advanceautoparts.com   index  replication,   • Dollardays.com   monitoring,  logging,                                                                                                                   and  cache   management     • Query  monitoring   2  “Consumers  will  spend  more  than  $340  billion  online  by  2013,  says  Forrester,”   for  better    Internet  Retailer,  27  November  2009,  http://www.internetretailer.com/dailyNews.asp?id=32630.   highlighting  of   popular  products       The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 12
  • 16.
                                                                        Case  Study  3   Zappos   Zappos  is  the  premier  destination  for  online  shoe  shopping.  At  Zappos,  the  mission  is  excellent  online  customer   service—customers  should  be  able  to  browse  shoe  styles,  sizes,  shapes,  and  colors  more  easily  than  any  other  shoe   store,  on  or  offline.  To  achieve  this,  Zappos  wanted  a  robust,  flexible,  multifunctional  search  solution/application.   After  evaluating  many  commercial  search  technologies,  Zappos  zeroed  in  on  Solr,  working  with  Lucid  Imagination  to   ensure  continued,  successful  deployment.   Requirements   • Simplified,  attractive  user  experience  that  makes  it  easy  to  find  and  buy   • Relevant  results,  fast   • Navigation  across  attributes,  such  as  size,  color,  and  style  for  broader  and  deeper  results   • Indexing  products  as  they  were  entered  in  the  catalogs   • Cross-­‐functional  navigation  to  give  customers  a  realistic  shopping  experience   • Intuitive  intelligence  to  provide  alternate  suggestions   • Analytical  capabilities  to  drive  business  strategy   • Facilitating  control  on  results   • Integration  with  existing  IT  infrastructure     The  Solr  Solution   • Search  results  in  subseconds,  across  categories   • Faceting,  for  easy  browsing  and  discovery  and  a  compelling  user  experience     • Real-­‐time  indexing  of  products   • Synchronization  of  visuals,  specs,  filters,  and  promotions  to  make  shopping  experience  true  to  life   • Information  on  user  activity  to  help  build  strategy  on  product  promotions   • Controls  to  rank    popular  or  high-­‐stock  products  in  results    where  users  are  more  likely  to  buy  them   • Facilitates  integration  with  heterogeneous  open  source  environment   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 13
  • 17.
                                                          Job and Career Sites Requirements     • Linguistic   Job  portals  are  countercyclical  to  the  economy.  When  the  economy   intelligence  for   flourishes,  posted  jobs  grow  in  number;  when  it  sags,  candidates  flock  in   more  relevant   to  post  their  resumes.  Success  for  an  online  job  portal  is  tied  to  the   results   efficiency  of  its  search  capability—matching  résumés  to  job  listings  and   • Control  search   vice  versa—so  both  employers  and  prospective  employees  can  zero  in   results  to  maintain   on  just  the  right  opportunity.   privacy   For  example,  an  employer  may  want  to  navigate  through  filters  to   • Deeper  search   narrow  the  scope  of  a  candidate  search,  such  as  education,  previous   capability   employer,  salary  history,  skillsets,  etc.;  a  job  seeker  may  want  to  expose   • Numeric  search   these  attributes,  but  keep  a  current  employer’s  name  confidential.  A  job-­‐ • Faster  query   seeker  may  want  to  apply  to  jobs  within  a  particular  geographic  area.   response   • Reduced   Lucene/Solr  not  only  provides  such  flexibility  but  also  addresses  other   infrastructure  and   complexities  of  this  industry  by  enabling  linguistic  intelligence  (such  as   customization  costs   identical  acronyms  that  correspond  to  different  entities;  variations  in     spelling,  imperfectly  constructed  search  queries);  indexing  unstructured   Solr  Solution   data  (résumés);  and  managing  ever-­‐growing  data.   • Intelligent,  faceted     search  to  enable   contextual  and     “I  think  the  breakthrough  was   linguistic  relevance     when  we  tried  it,  and  we   • Easy  configuration   realized,  wow,  this  thing  could   for  parsing     structured  and   really  scale.”   unstructured  data       • Easy  and  seamless     Peter  Keegan,  Monster.com   installation  for     Success  Stories   lower  TCO   • Business  process   • Monster   integration  and   • The  Big  Jobs   Customization  with   • eBharatJobs   open  source  code     • Careerjet       M The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 14  
  • 18.
                                                            Monster.com   Monster  is  the  largest  job  search  engine  in  the  world,  with  over  a  million  jobs  posted  at  any  one  time.  By  2008  it  had   150  million  résumés  in  its  database,  serving  over  63  million  job  seekers  per  month,  now  running  on  average  300  to   400  queries  per  second  with  an  average  response  time  of  40  milliseconds.  To  provide  the  highest  level  of  service   and  support  to  their  customers—both  employers  and  job  seekers—Monster  has  an  unmatched  marketplace  for   employment  opportunities,  with  Lucene-­‐based  search  at  the  heart  of  its  business  model.     The  Requirements     • Managing  high  volumes  of  data,  continually  increasing  by  double  digit  percentages  annually   • Maintaining  constant  inventory  updates  and  providing  faster  results   • Removing  technological  barriers  that  limit  the  scope  of  information   • Enabling  end  users  to  refine  search  and  drill  deeper  without  any  performance  impact   • Providing  security  controls  to  ensure  end  user  privacy   • Facilitating  scalability  and  flexibility  in  tandem  with  company’s  vision  and  growth  plans     The  Lucene  Solution     • High  volumes  of  data  by  clustering  data  to  reduce  the  index  size     • Real-­‐time  indexing  for  fresher,  faster  query  results     • Intuitive  search  to  enable  in-­‐depth  cross-­‐functional  job  and  résumé  browsing   • Faceted  search  and  ‘single  click’  filters  for  search  refinement     • Security  controls  to  manage  user  information   • Unlimited  scalability  and  customization  leveraging  open  source  licensing     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 15
  • 19.
                                                      Case  Study  4   Libraries, Archives, and Museums (LAMs) Search The  core  asset  of  educational  and  research  institutions  is  knowledge   Requirements     archived  and  accumulated  over  decades.  In  the  world  of  academic  search,   the  diversity  of  information  for  any  query—text,  illustration,  audio/video   • Management  of     media,  or  data  in  any  other  format—makes  unstructured  formats  a  key   multiple  formats  of   aspect  of  the  searchable  archive.     data  and  documents   • Customization  and   Lucene/Solr  gives  academic  and  research  institutions  the  power  to  turn   scalability     information  into  knowledge  by  going  beyond  keyword-­‐driven  search  to   • Linguistic  support  in   expose  a  rich  variety  of  results  and  exploration.  Based  on  the  open  source   queries     model,  it  not  only  integrates  with  the  existing  IT  infrastructure  but  also   • Faster  results   leverages  the  existing  classification  hierarchies  to  give  structure  to     terabytes  of  information  spread  across  disparate  collections,  significantly   reducing  overhead  and  enabling  flexible  and  scalable  deployment.   Solr  Solution     • Optimized  index   infrastructure  limits     “With  Solr,  you  can  do  so  many  things   size  without     without  writing  a  lick  of  code.  I  hadn't   compromising  speed   realized  how  easy  it  is  to  extend  our   or  flexibility     custom  request  handler,  response   • Easy  customization   for  implementing     writer,  and  update  handler.  Just  move   taxonomy  rules     it  all  to  Solr  and  let  it  do  the  heavy   • Faceted  search  to     lifting.”   narrow  results  to  a   specific  source  across     Sjored  Siebinga,  Europeana   diverse  sets  of  data   • Instant  results   Success  Stories   • Seamless  integration   • Smithsonian  Institute     with  IT   • Europeana,  the  European  Union  online  cultural  archive   infrastructure  for   • The  US  Library  of  Congress  and  World  Digital  Library   lower  TCO   • Stanford  University  Library     • University  of  Michigan  Graduate  Library             The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 16
  • 20.
                                                         Case  Study  5     Smithsonian     The  Smithsonian  Institution  is  the  flagship  museum  collection  of  the  United  States,  supporting  a  research  institute     that  provides  “one-­‐stop”  searching  for  2  million  records,  including  nearly  a  quarter  of  a  million  media  files  (images,   media  files,  online  journals,  and  other  resources)  distributed  across  dozens  of  archives,  databases,  museums,  and     libraries.  To  make  this  treasure  of  information  easily  accessible  to  people,  the  Smithsonian  needed  an  efficient   search  solution  that  could  overcome  the  following  challenges:   The  Challenges   Managing  a  complicated  taxonomy  that  could  no  longer  accommodate  a  growing  data  index   • Indexing  disparate  types  of  content,  including  documents,  videos,  and  images   • Making  information  available  from  a  large  database   • Providing  access  controls  to  restrict  information     • Integrating  with  existing  legacy  tools     •   Smithsonian  chose  Lucene/Solr,  and  worked  with  Lucid  Imagination  to  create  an  optimized,  well-­‐designed  solution.   The  Solr  Solution   • Efficient  index  strategy  to  manage  a  mix  of  structured  and  unstructured  data   • Holistic  search,  by  optimizing  configuration  to  reduce    the  number  of  servers  and  better  handling  query   requests   • Filtering  information  through  faceted  search     • Access  controls  to  restrict  information  based  on  membership  profiles   • Integration  with  the  existing  IT  infrastructure   • Provides  guidance  and  assistance  on  setting  replicated  search  environment       The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 17
  • 21.
                                                      Requirements     Social Media Search • Deliver  search  results   Search  solutions  must  support  differentiated  business  models   matching  Web  2.0  innovations,  including  user-­‐generated  content   as  soon  as  content  is   and  mashups,  without  compromising  scalability—a  challenge,   available   given  the  virtually  limitless  content  on  the  Internet.  Success  and   • Deeper  drill  down   differentiation  is  measured  by  how  well  the  site  provides  relevant   capabilities   results  to  grow  its  user  base  and  keeps  them  engaged.   • Intuitive  interface   Increasingly,  the  technological  factors  driving  Web  2.0  application     paradigms  are  finding  their  way  into  the  enterprise,  unlocking   collaboration  and  productivity  in  new  ways  that  challenge   Lucene/Solr  Solution   conventional  organizational  bounds—and  that  rely  in  equal   measure  on  search  to  create  the  connections  between  employees   • Near-­‐instant  results   to  enable  discovery,  cross-­‐pollination,  and  more  efficient  collective   with  segmentable   effort.   indexing     Lucene/Solr  not  only  provides  fast  results  but  also  facilitates   • Intuitive  search     flexible,  intuitive  navigation  to  help  end  users  connect  with  others.   • Data-­‐driven   It  boosts  the  reach  and  performance  of  search,  while  cutting   spellchecking  based   implementation  costs  and  lowering  barriers  to  innovation.     on  user  search       histories    Linguistic  support   Success  Stories   “With  Solr,  we  really  treat  it   through  ‘Did  you   • Digg   as  kind  of  a  platform  where   Myspace   mean"  functionality     • we  can  build  other  kind  of    Highlighting  keywords   • LinkedIn   • Reddit   things  on  top  of  it…  We  have   • Deeper  drill  down   • Technorati   a  very  valuable  set  of  data,   with  faceting   • Scout  Labs   and  we  really  want  to   • Xmarks.com   • Real-­‐time  content   explore  new  ways  of   updating   building  new  features  from     that  data  set.”   —Sammy  Yu,  Digg.com   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 18
  • 22.
                                                          Case  Study  6   Digg.com   Digg  displays  the  wisdom  of  the  crowds.  By  leveraging  the  mass  collaboration  of  readers  distributed  across  the   Internet—everything  on  Digg  is  submitted  by  the  public  community  for  the  public  community—it  builds  on  the  easy   findability  of  information  valued  by  the  marketplace  of  readers  and  consumers.     Digg  realized  early  on  that  to  succeed  in  the  business  of  information,  they  needed  to  make  information  available  to   their  audience  as  effortlessly  as  possible.  They  saw  the  following  challenges  as  roadblocks  for  implementing  a  base   search  application:   Requirements   • Managing  unstructured  data  (13  million  documents  and  growing)  in  real  time   • Providing  results  faster   • Facilitating  smart  navigation  to  provide  information  in  digestible  portions   • Recognizing  and  eliminating  duplicate  content   • Providing  semantic  and  linguistic  smart  application   • Facilitating  scalability  while  retaining  costs       Digg  selected  Solr  for  its  unmatched  flexibility  and  functionality.   The  Solr  Solution   • Highly  customizable  and  flexible   • Results  in  subseconds,  with  simple-­‐to-­‐use  pull  downs  to  refine  results   • Fuzzy  duplicate  detection  (by  coding)   • Unlimited  scalability  and  seamless  integration  with  the  heterogeneous  environment   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 19
  • 23.
                                                        Case  Study  7     LinkedIn     Connecting  50  million  registered  users  from  200  countries  across  170  industries  and  matching  them  to     the  right  professional  contacts  is  what  LinkedIn  is  all  about.  LinkedIn’s  business  is  premised  on   intelligent  search  application  that  could  overcome  the  following:       The  Challenges     • Managing  an  ever-­‐growing  database,  with  one  new  member  joining  and  creating  a  profile  every     second   • Indexing  unstructured  data  in  real  time     • Giving  instant  query  responses,  even  in  peak  traffic  hours   • Providing  intuitive  navigation  and  intelligent  linguistic  support     • Integrating  with  other  Web  2.0  tools  to  build  user  profiles  that  integrate  data  from  multiple     sources   They  chose  Lucene  to  implement  the  search  function  at  the  core  of  their  business  model.       The  Lucene  Solution     • Used  index  segmentation  for  faster  results  and  to  limit  index  base   • Provided  faceted  search  and  intelligence  support  features  like  changing  the  view  of  search   results  and  auto-­‐completion  of  contacts     • Calculated  relative  relevance,  ranking  results  on  the  fly  based  on  relationship  between  the  user’s   profile  and  the  other  profiles  being  searched     • Integrated  with  the  latest  web  tools;  for  example,  incorporating  videos  in  search  results   • Provided  "scale  as  you  grow”  facility  through  the  flexibility  of  the  open  source  model     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 20
  • 24.
                                                        Enterprise (Intranet) Search Enterprises  today  have  a  global  footprint,  which  leads  to  the  creation  of   Requirements   multiple  content  types  and  the  use  of  disparate  applications  and  content   management  systems  across  business  centers.  The  result  is  often  silos  of   • Single  interface  to   unmanaged  data  spread  across  the  intranet  of  an  enterprise—a  situation   access  enterprise   where  information  is  omnipresent  but  cannot  be  used.   data     • Faster  results     To  achieve  a  competitive  advantage,  enable  intelligent  decisionmaking,   • Control  over  search   eliminate  duplication  of  work,  and  lower  the  cost  of  ownership,   results     enterprises  need  a  search  application  that  gives  structure  to   • Ready  integration   unstructured  data;  provides  a  single  gateway  to  search  across  multiple   with  existing   enterprise  repositories,  with  speed,  flexibility,  and  intuitive  intelligence.     content   Lucene/Solr  is  a  solid  match  for  enterprise  search.  As  a  customizable  and   management   multifunctional  search  application,  Lucene/Solr  provides  robust  search   software   features  at  minimal  cost.  The  open  source  development  model  behind     Lucene/Solr  integrates  seamlessly  with  legacy  tools,  and  brings  down   Solr  Solution   the  total  cost  of  ownership  significantly.     • Single  gateway  for   Given  the  sensitive  nature  of  enterprise  content,  Lucene/Solr  facilitates   all  types  of  data   document-­‐level,  role-­‐based  security.  And  with  the  transparent  search   • Dynamic  boosting   algorithms  and  configurability  for  relevancy,  Lucene/Solr  enables   of  content   intranet  search  with  the  precise  control  enterprise  content  owners   • Transparent  search   require,  ensuring  that  results  consistently  deliver  the  right  documents  to   algorithms  and   the  right  people.   relevancy  tuning   • Customization  and     easy  integration     “The  search  and  discovery   with  open  source     software  market  grew  19   code   percent  in  2008  to  $2.1  billion”       Sue  Feldman,  IDC     M   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 21
  • 25.
                                                        Case  Study  8   Food  and  Drug  Administration   The  Food  and  Drug  Administration  (FDA)  is  a  U.S.  government  agency  responsible  for  regulating   and  supervising  the  safety  of  foods  medications,  veterinary  products,  tobacco,  and  cosmetics.  The   FDA  has  a  large  repository  of  information  that  dates  back  multiple  decades,  and  exists  in  formats   ranging  from  early  optical  character  recognition  to  recent  electronic  formats.  To  mine  this   knowledge  base,  the  FDA  is  developing  a  semantic  mining  framework  using  open  source  tools  such   as  Apache  Lucene  and  Solr.   Requirements     • Integrating  petabytes  of  data  highly  distributed  across  the  intranet  of  an  enterprise   • Managing  multiple  indices  for  documents  stored  in  distributed  repositories     • Managing  and  maintaining  archival  data  and  evolving  vocabularies   • Indexing  unstructured  data  in  real  time   • Recognizing  and  eliminating  duplicate  content   • Handling  concurrent  queries  and  delivering  fast  and  relevant  results   • Restricting  search  results  according  to  agency  access  control  policies     • Integrating  with  existing  infrastructure  without  additional  overhead   The  Lucene  Solution   • A  single  gateway  to  search  across  multiple  enterprise  repositories   • Duplicate  detection     • Fast  and  relevant  results  with  content  analysis  and  query  interpretation  algorithms   • Filters  results  based  on  access  controls  and  security  policies  of  an  enterprise     • Facilitates  integration  with  existing  enterprise  infrastructure  to  reduce  TCO       The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 22
  • 26.
                                                      Business Use Case Matrix To  simplify  mapping  your  search  needs  to  existing  search  applications  in  the  real  world,  the  matrix   below  compares  business  use  cases  against  key  search  requirements.  While  not  an  exhaustive  list,   the  matrix  highlights  the  different  business  use  cases  across  sectors  and  business  models,  reflecting   the  adaptability  of  Lucene/Solr  across  the  various  domains  of  search  applications  and  use  cases.     Users Content Content Update Frequency Access Verticals Customer Control Internal Original Aggregated High Medium Low Facing Enterprise (Intranet) √ √ √ √ Schools/ √ √ √ √ √ √ Universities Education Libraries √ √ √ √ √ Job Portals √ √ √ √ Social Networks √ √ √ √ √ News √ √ √ √ Media Media √ √ √ √ E-Commerce Sites √ √ √ √ √ √ Financial Services √ √ √ √ √ Yellow Pages √ √ √ Horizontal Portals √ √ √ √   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 23
  • 27.
                                                      Appendix: Lucene/Solr Features and Benefits Lucene  and  Solr  are  complementary  technologies  that  offer  very  similar  underlying  capabilities.  In   choosing  a  search  solution  that  is  best  suited  for  your  requirements,  key  factors  to  consider  are   application  scope,  development  environment,  and  software  development  preferences.     Lucene  is  a  Java  technology-­‐based  search  library  that  offers  speed,  relevancy  ranking,  complete   query  capabilities,  portability,  scalability,  and  low  overhead  indexes  and  rapid  incremental   indexing.     Solr  is  the  Lucene  Search  Server.  It  presents  a  web  service  layer  built  atop  Lucene  using  the  Lucene   search  library  and  extending  it  to  provide  application  users  with  a  ready-­‐to-­‐use  search  platform.   Solr  brings  with  it  operational  and  administrative  capabilities  like  web  services,  faceting,   configurable  schema,  caching,  replication,  and  administrative  tools  for  configuration,  data  loading,   statistics,  logging,  cache  management,  and  more.   Lucene  presents  a  collection  of  directly  callable  Java  libraries  and  requires  coding  and  solid   information  retrieval  experience.  Solr  extends  the  capabilities  of  Lucene  to  provide  an  enterprise-­‐ ready  search  platform,  eliminating  the  need  for  extensive  programming.     Solr  provides  the  starting  point  for  most  developers  who  are  building  a  Lucene-­‐based  search   application.  It  comes  ready  to  run  in  a  servlet  container  such  as  Tomcat  or  Jetty,  making  it  ready  to   scale  in  a  production  Java  environment.     With  convenient  ReST-­‐like/web-­‐service  interfaces  callable  over  HTTP,  and  transparent  XML-­‐based   configuration  files,  Solr  can  greatly  accelerate  application  development  and  maintenance.  In  fact,   Lucene  programmers  have  often  reported  that  they  find  Solr  contains  “the  same  features  I  was   going  to  build  myself  as  a  framework  for  Lucene,  but  already  very  well  implemented.”  Using  Solr,   enterprises  can  customize  the  search  application  according  to  their  requirements,  without   involving  the  cost  and  risk  of  writing  the  code  from  the  scratch.   Lucene  provides  greater  control  of  your  source  code  and  works  best  in  development  environments   where  resources  need  to  be  controlled  exclusively  by  Java  API  calls.  It  works  best  when   constructing  and  embedding  a  state-­‐of-­‐the-­‐art  search  engine,  allowing  programmers  to  assemble   and  compile  inside  a  native  Java  application.  While  working  with  Lucene,  programmers  can  directly   control  the  large  set  of  sophisticated  features  with  low-­‐level  access,  data,  or  state  manipulation.     Enterprises  that  do  not  require  strict  control  of  low-­‐level  Java  libraries  generally  prefer  Solr,  as  it   provides  ease  of  use  and  scalable  search  power  out  of  the  box.     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 24
  • 28.
                                                        As  functional  siblings,  Lucene  and  Solr  have  become  popular  alternatives  for  search  applications;   the  two  differ  mainly  in  the  style  of  application  development  used.  Key  benefits  of  search  with   Lucene/Solr  include:       • Search  Quality:  Speed,  Relevance,  and  Precision  Lucene/Solr  provides  near-­‐real-­‐time   search  and  strong  relevance  ranking  to  deliver  contextually  relevant  and  accurate  results   very  quickly.  Tailor-­‐made  coding  for  relevancy  ranking  and  sophisticated  search   capabilities  like  faceted  search  help  users  in  sorting,  organizing,  classifying,  and  structuring   retrieved  information  to  ensure  that  search  delivers  desired  results.  Search  with   Lucene/Solr  also  provides  proximity  operators,  wildcards,  fielded  searching,   term/field/document  weights,  find-­‐similar  functions,  spell  checking,  multilingual  search,   and  much  more.     • Lower  Cost  and  Greater  Flexibility,  Plug  and  Play  Architecture  Lucene/Solr  reduces   recurring  and  nonrecurring  costs,  lowering  your  TCO.  As  open  source  software,  it  does  not   require  purchase  of  a  license  and  is  freely  available  for  use.  The  open  source  code  can  be   used  as  is,  modified,  customized,  and  updated  as  appropriate  to  your  needs.  Solr  is  easily   embedded  in  your  enterprise’s  existing  infrastructure,  reducing  costs  of  installation,   configuration,  and  management.     • Open  Source  Platform  for  Portability  and  Easy  Deployment  Because  Lucene/Solr  is  an   open-­‐source  software  solution,  it  is  based  on  open  standards  and  community-­‐driven   development  processes.  It  is  highly  portable  and  can  run  on  any  platform  that  supports  Java.   For  instance,  you  can  build  an  index  on  Linux  and  copy  it  to  a  Microsoft  Windows  machine   and  search  there.  This  unsurpassed  portability  enables  you  to  keep  your  search  application   and  your  company’s  evolving  infrastructure  in  tandem.  Lucene,  in  turn,  has  been   implemented  in  other  environments,  including  C#,  C,  Python,  and  PHP.  At  deployment  time,   Solr  offers  very  flexible  options;  it  can  be  easily  deployed  on  a  single  server  as  well  as  on   distributed,  multiserver  systems.   • Largest  Installed  Base  of  Applications,  Increasing  Customer  Base  Lucene/Solr  is  the   most  widely  used  open  source  search  system  and  is  installed  in  around  4,000  organizations   worldwide.  Publicly  visible  search  sites  that  use  Lucene/Solr  include  CNET,  LinkedIn,   Monster,  Digg,  Zappos,  MySpace,  Netflix,  and  Wikipedia.  Lucene/Solr  is  also  in  use  at  Apple,   HP,  IBM,  Iron  Mountain,  and  Los  Alamos  National  Laboratories.   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 25
  • 29.
                                                      • Large  Developer  Base  and  Adaptability  As  community  developed  software,  Lucene/Solr   provides  transparent  development  and  easy  access  to  updates  and  releases.  Developers  can   work  with  open  source  code  and  customize  the  software  according  to  business-­‐specific   needs  and  objectives.  Its  open  source  paradigm  lets  Lucene/Solr  provide  developers  with   the  freedom  and  flexibility  to  evolve  the  software  with  changing  requirements,  liberating   them  from  the  constraints  of  commercial  vendors.     • Commercial-­Grade  Support  for  Mission  Critical  Search  Applications  from  Lucid   Imagination  Lucid  Imagination  provides  the  expertise,  resources,  and  services  that  are   needed  to  help  enterprises  deploy  and  develop  Lucene-­‐based  search  solutions  efficiently   and  cost-­‐effectively.  Lucid  helps  enterprises  achieve  optimal  search  performance  and   accuracy  with  its  broad  range  of  expertise,  which  includes  indexing  and  metadata   management,  content  analysis,  business  rule  application,  and  natural  language  processing.   Lucid  Imagination  also  offers  certified  distributions  of  Lucene  and  Solr,  commercial-­‐grade   SLA-­‐based  support,  training,  high-­‐level  consulting  and  value-­‐added  software  extensions  to   enable  customers  to  create  powerful  and  successful  search  applications.   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 26