...
                                                                  	
                                                      ...
                                                                                    	
                                    ...
                                                                                                                          ...
                                                                       	
                                                 ...
                                                                    	
                                                    ...
                                                                     	
                                                   ...
                                                                   	
                                                     ...
                                                               	
                                                         ...
                                                           	
                                                             ...
                                                                                        	
                                ...
                                                                             	
                                           ...
                                                                      	
                                                  ...
                                                                         	
                                               ...
                                                                                                                          ...
                                                                              	
                                          ...
                                                                    	
                                                    ...
                                                                                    	
                                    ...
Open Source Search Applications
Open Source Search Applications
Open Source Search Applications
Open Source Search Applications
Open Source Search Applications
Open Source Search Applications
Open Source Search Applications
Open Source Search Applications
Open Source Search Applications
Open Source Search Applications
Open Source Search Applications
Upcoming SlideShare
Loading in …5
×

Open Source Search Applications

3,346 views

Published on

"Thousands of organizations around the world, including AT&T, Sears, Ford, Verizon, The Guardian, Elsevier, Cisco, Macy’s and more have found their solution: Lucene/Solr open source, the world’s most popular search technology. Our new white paper “A Manager’s Guide to Real World Open Source Search Applications” provides numerous case studies across various industries and business models to show how real-world businesses have turned Lucene/Solr open source search into competitive advantage.http://www.lucidimagination.com/files/file/whitepaper/LIWP_LuceneSolrRealWorldSearch.pdf
"

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,346
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Open Source Search Applications

  1. 1.                                                                     The  Case  for  Lucene/Solr:     A  Manager’s  Guide     to  Real  World     Open  Source     Search  Applications           By  Lucid  Imagination    
  2. 2.                                                     Abstract   In  today’s  information-­‐driven  environment,  search  is  a  critical  solution  to  problems  when  it  slashes   the  time  and  effort  separating  end  users  from  the  data  they  value.  Search  spans  the  range  of   business  models  and  use  cases—from  driving  direct  customer  sales,  to  analytics  and  business   intelligence,  employee  productivity,  and  reduced  administrative  overhead.  Making  the  best  use  of   search  requires  two  perspectives:  both  a  look  at  the  business  requirements  for  a  search  application   and  a  view  to  new  business  opportunities  created  by  using  search  to  leverage  the  organization’s   content  resources.       Thousands  of  organizations  across  different  sectors  and  business  models  have  harnessed  Apache   Lucene/Solr  to  search  their  rapidly  growing  and  diversifying  content  resources.  Underlying  this   broad  adoption  is  the  extraordinary  power,  scalability,  and  versatility  of  open  source  search   technologies.       This  paper  provides  an  overview  of  both  the  requirements  and  the  opportunities  for  search   applications.  It  then  explores  how  real  world  organizations  are  successfully  using  Lucene/Solr   search  applications  to  meet  those  opportunities,  presenting  how  the  technology  is  used  for  specific   business  models  and  use  cases  across  industries.  In  addition,  it  offers  a  baseline  for  setting  search   requirements  that  managers  and  architects  can  use  to  adopt  Lucene/Solr,  and  adapt  this  open   source  search  technology  to  the  unique  needs  of  their  business.                       ©  2010,  Lucid  Imagination   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page ii
  3. 3.                                                     Table  of  Contents   Introduction ............................................................................................................................................................... 1   Understanding  Search  Opportunities  and  Requirements ...................................................................... 2   What  Data  and  Documents  Are  You  Searching? ................................................................................ 3   Who  Needs  the  Results  and  Why? ........................................................................................................... 3   Where  Is  Search  Integrated  with  IT  Infrastructure? ....................................................................... 5   How  Is  the  Search  Interface  Presented  to  the  User?........................................................................ 5   The  Real  World:  Applications  and  Case  Studies ......................................................................................... 7   Yellow  Pages,  Local  Search,  and  Searching  Classifieds........................................................................ 8   Media .......................................................................................................................................................................10   E-­‐commerce..........................................................................................................................................................12   Job  and  Career  Sites ..........................................................................................................................................14   Libraries,  Archives,  and  Museums  (LAMs)  Search ..............................................................................16   Social  Media  Search...........................................................................................................................................18   Enterprise  (Intranet)  Search.........................................................................................................................21   Business  Use  Case  Matrix ...................................................................................................................................23   Appendix:  Lucene/Solr  Features  and  Benefits..........................................................................................24     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page iii
  4. 4.                                                   Introduction As  fast  as  companies,  communities,  and  consumers  produce  data—about  each  other,  products,   opinions,  research,  and  everything  else  imaginable—they  need  faster,  more  versatile  search   capabilities  to  find  the  information  they  need  to  create  opportunities  for  competitive  advantage.  In   today’s  information-­‐driven  environment,  search  addresses  the  critical  problems  created  by  the   explosive  growth  of  content  by  slashing  the  time  and  effort  users  expend  in  finding  data  they  value.   Search  spans  the  range  of  business  models  and  use  cases:  from  driving  direct  customer  sales,  to   analytics  and  business  intelligence,  employee  productivity,  and  reduced  administrative  overhead.     Apache  Lucene/Solr1  open  source  search  technology  has  been  implemented  across  the  broadest   range  of  applications  and  business  models—and  likely  in  ways  that  can  fit  the  needs  of  your   organization.  In  successful  operation  today  at  thousands  of  enterprises,  Lucene/Solr  technology   scales  from  tens  of  thousands  to  hundreds  and  billions  of  documents;  searches  data  that  is   structured,  unstructured,  and  in  combination;  data  inside  and  outside  the  firewall;  and  ranges  in   use  from  a  simple  website  search  box  through  sophisticated  faceted  navigation.  It  addresses  equally   diverse  business  processes  and  mission  critical  applications.  Across  the  spectrum,  Lucene/Solr   helps  users  find,  make  sense  of,  and  act  upon  information  quickly  and  efficiently.   In  this  white  paper,  we’ll  review  real-­‐world  case  studies  for  Lucene/Solr  functionality  across   business  sectors  to  demonstrate  its  versatility  and  varied  applicability.  The  diversity  of  examples   provides  strong  evidence  of  Lucene/Solr’s  flexibility  and  power  as  a  search  technology.  The   examples  also  attest  to  the  innovation  and  transparency  inherent  to  the  open  source  development   model.  Our  focus  is  on  familiarizing  the  audience  of  business  managers  and  application  owners  with   existing  Lucene/Solr  applications;  the  substantial  technical  advantages  to  developers  are  covered   elsewhere.                                                                                                                     1 Lucene and Solr are complementary technologies that offer very similar underlying capabilities; Solr is the Lucene Search Server. Since Lucene serves as the core of Solr’s search capabilities, this paper refers to the two as Lucene/Solr. For more information, see the Appendix. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 1
  5. 5.                                                   We’ll  first  survey  the  key  requirements  and  business  use  cases  of  search  and  then  look  at  where   they  are  built  into  search  applications.  Our  objective  is  to  provide  business  managers  and   application  owners  with  a  broad  perspective  on  how  Lucene/Solr  search  technology  is  used  to  build   solutions  to  compelling  business  problems.  In  the  Appendix,  we  provide  an  overview  of   Lucene/Solr’s  key  features  and  benefits,  with  a  basic  outline  of  the  capabilities  offered  to  meet  the   broadest  range  of  business  needs.     Understanding Search Opportunities and Requirements Search  technology  has  come  a  long  way  from  its  roots  in  matching  keywords  with  appearance  in   documents  and  obtaining  undifferentiated  results.  Search  today  empowers  users  by  delivering   actionable  information  quickly  and  efficiently,  across  multiple,  diverse  sources  of  data.  The   business  use  cases  range  from  executing  mission  critical  commercial  transactions  (e.g.,  e-­‐commerce   sites)  to  unlocking  employee  and  end-­‐user  productivity  in  the  search  for  a  single  relevant  document   (e.g.,  enterprise  search).     Given  the  breadth  of  capability  of  the  problem  domain,  it’s  useful  to  look  at  search  and  ask  two   fundamental  questions:  “How  it  can  it  solve  my  business  problems?”  and  “What  new  business   opportunities  can  search  solve  for?”   In  considering  how  search  technology  solves  business  problems,  it  is  useful  to  start  with  an   elucidation  of  the  requirements  you’ll  need  to  consider  for  your  search  application.  At  the  same   time,  be  sure  to  look  more  broadly  at  the  capabilities  that  Lucene/Solr  offers,  as  it  can  help  open  up   new  frontiers  for  incorporating  search  and  leveraging  more  value  from  data  repositories.     Starting  with  some  basic  questions—what,  who,  how,  and  where—you  can  clarify  the  high-­‐level   business  requirements  specific  to  your  business  needs,  which  in  turn  allow  you  to  make  the  best   decisions  for  your  search  application.  The  process  of  looking  at  the  fundamentals  also  raises  new   questions  about  how  and  where  the  search  technology  offered  by  Lucene  and  Solr  can  create  new   business  opportunities.   Let’s  look  at  four  fundamental  questions  you  should  address  in  understanding  search  opportunities   and  requirements:   • What  data  and  documents  are  you  searching?     • Who  needs  the  results  and  why?     • Where  is  search  integrated  with  IT  Infrastructure?         • How  is  the  search  interface  presented  to  the  user?     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 2
  6. 6.                                                   What Data and Documents Are You Searching? Business  today  is  driven  more  than  ever  by  the  end-­‐users’  creation  and  consumption  of  real-­‐time   information.  A  key  differentiating  capability  of  search  technology  is  ingesting  a  broad  range  of   content  types  and  processing  large  collections  of  diverse  data  in  real  time  in  order  to  deliver   actionable  information.  Two  aspects  to  consider:   • Types  of  Content   Content  comes  in  multiple  formats:  HTML  pages,  XML  files,  PDFs,  images,  PowerPoint   presentations,  Excel  spreadsheets,  Word  documents,  log  files,  multimedia  content,  and   more.  Content  resides  in  various  repositories,  including  databases,  file  servers,  content   management  systems,  archiving  systems,  collaboration  applications,  and  employee   desktops  and  laptops.  Search  technology  must  be  able  to  locate,  organize,  and  aggregate   data  whatever  its  form  or  location.     • Frequency  of  Updating  Content   Organizations  update  content  at  varying  intervals,  driven  by  differing  business  processes   and  models—social  media  or  news  applications  have  real-­‐time  content  need,  whereas  an  e-­‐ commerce  application  might  re-­‐index  in  response  to  new  inventory  on  a  batch  basis  and  a   research  institution  might  add  to  its  collection  less  often  still.  Search  applications  need  to  be   adaptable  to  the  differences  in  content  change  frequency.   Who Needs the Results and Why? Business  search  puts  a  high  priority  on  end  user  experience  and  results  in  which  the  searched   content  is  tuned  to  the  unique  needs  of  each  user.  Because,  after  all,  the  human  dimension—the   usefulness  of  results  and  the  efficacy  of  interaction—is  the  acid  test  of  a  search  application.  Internet   search  applications  like  Google,  Yahoo,  and  Bing  are  now  common  and  mature.  They  have  raised   user  expectations  about  key  qualities  of  the  search  experience...but  they  solve  a  very  different   problem.     While  Internet  searches  can  produce  millions  of  results  in  milliseconds,  they  rely  on  measures  like   website  popularity  or  URLs  and  domain  names—not  relevant  and  not  generally  applicable  to   purpose-­‐built  applications  for  businesses.  What’s  more,  they  rely  on  generalizing  relevancy  for  a   global  population  of  all  Internet  users,  without  being  tied  to  business  rules,  or  business  process   logic,  or  the  opportunity  cost  of  improved  precision  for  a  specific  set  of  data  or  search  users.   Business  search  applications  cannot  rely  on  such  brute  force  coarse  approaches  to  tune  their   results.  They  need  far  more  control  and  precision.  They  have  to  be  able  to  deliver  highly  useful   results  while  matching,  if  not  exceeding,  the  levels  of  user  experience  that  people  have  come  to   expect  by  virtue  of  their  daily  interactions  with  commercial  search  engines.  Key  points  of   consideration  from  a  business  perspective  are:   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 3
  7. 7.                                                   • Relevance   Relevance  is  entirely  a  factor  of  the  goals  of  the  search  application’s  users.  The  application   must  have  the  mechanisms  to  recognize  the  subjective  needs  of  users  and  tune  results   accordingly.  It  must  also  provide  easier  ways  to  narrow  search  criteria  without  requiring   users  to  come  up  with  perfect  query  terms.  Flexibility  for  drilling  deeper  will  make  results   richer  and  valuable.  Mechanisms  to  apply  filters,  proximity  values,  and  sorting  parameters   to  narrow  search  scope  can  also  lead  to  a  richer  set  of  more  useful  results,  with  less  time   and  effort.   • Cost  of  Relevance     As  business  goals  are  driven  by  revenue  opportunities  and  cost  savings,  it  is  critical  to  tie   relevance  to  the  economics  of  the  business.  For  example,  a  public-­‐facing  retail  site  should   focus  on  matching  merchandise  to  search,  site  stickiness,  and  customer  loyalty.  It  requires   search  technology  that  streamlines  and  simplifies  the  shopping  experience  with  relevant   results  directly  contributing  to  sales  revenue.  For  knowledge  workers,  internal  search   applications  should  help  make  employees  more  productive  by  reducing  the  amount  of  time   and  effort  to  find  documents  they  need  to  do  their  jobs.  Multiple  studies  show  that   information  workers  can  spend  20–30%  of  their  time  searching  for  information.   • Precision  Ranking   Result  accuracy,  sorted  by  attributes  like  relevance,  date,  field,  or  any  document  property   feature,  makes  the  search  process  better.  End  users  generally  abandon  a  search  before   tackling  the  fine  points  of  Boolean  logic  or  scrolling  for  a  result  buried  too  far  down.     • Query  Response  Speed   Today,  5–7  seconds  is  the  typical  threshold  for  end-­‐user  patience.  Too  much  wait  time  for   search  results  frustrates  users,  and  causes  them  to  abandon  pages.  Fast,  relevant  results   cannot  be  limited  by  search  technology  hamstrung  by  data  influx  or  query  overload.  Query   response  time  should  also  work  hand-­‐in-­‐hand  with  the  refinement  of  multiple  search   attributes,  so  that  increasingly  complex  queries  do  not  extract  a  performance  penalty.   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 4
  8. 8.                                                   Where Is Search Integrated with IT Infrastructure? Useful,  valuable  search  technology  rarely  exists  in  isolation.  Searched  data  is  transformed  into   actionable  information  when  it  is  integrated  with  the  organization’s  information  infrastructure:   business  process  to  business  intelligence  to  content  management  systems.  A  robust  search   technology  must  be  customizable  to  integrate  with  the  existing  systems  seamlessly.     • Application  Integration   A  key  requirement  for  a  search  application  is  its  extensibility  for  integration  with  existing   infrastructure  and  applications  like  content  management  systems,  databases,  and  the  full   range  of  business  processes  and  applications.  It  should  have  interfaces  that  support   ingestion  of  data  as  well  as  delivery  of  results  in  readily  consumable  formats—because  in   many  cases,  results  are  consumed  by  other  applications,  not  a  human.   • Scalability   We  can  assume  that  data  will  change  and  grow.  So  scalability  is  a  key  factor  for  search   application.  Applications  should  grow  to  address  future  needs  without  penalties  for  the   breadth  of  data  or  for  the  count  of  documents  indexed.  The  search  application  should  be   able  to  grow  with  the  requirements  of  the  organization,  without  needing  additional  large   investments  in  hardware  to  match  the  pace  of  growth.  Proprietary  search  vendors  often   charge  for  search  by  the  number  of  documents  indexed.  In  a  world  where  constantly   expanding  content  growth  is  the  norm,  such  costs  can  be  a  real  and  substantial  drag  on   the  cost  of  ownership  for  search  applications,  many  times  resulting  in  negative  return.     • Security   Every  organization  has  its  own  security  requirements  and  access  controls.  Search   technologies  need  to  comply  with  the  security  policies  of  the  enterprise,  controlling   results  that  have  restricted  access.  The  search  technology  should  also  be  able  to  make  use   of  document-­‐level  security  from  other  sources.     How Is the Search Interface Presented to the User? The  user  interface  is  where  search  delivers  on  findability  and  presents  actionable  results.  The   search  application  is  only  as  good  as  the  convenience  of  submitting  queries,  reviewing  and  refining   results,  and  finding  information.  Key  aspects  to  consider:     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 5
  9. 9.                                                   • Navigation   Users  benefit  from  guidance  that  makes  their  queries  more  productive.  Techniques  such  as   faceted  search  with  result  clustering,  advance  hinting  (“did  you  mean”),  “more  like  this,”   and  drop  down  menus  for  setting  search  scope  help  users  achieve  desired  results  faster,   making  a  search  application  both  user-­‐  and  information-­‐friendly.  It  is  also  important  to   allow  users  to  draw  associative  connections  between  results—using  the  technology  to   uncover  relationships  and  discover  more  about  what  they  were  seeking  than  they  knew  at   the  outset.     The  NetFlix  search   application  is  powered   by  Solr;  it  adds  the  fuzzy   dimension  to  search,   with  auto-­completion  of   movie  names,  correction   of  misspelled  names  of   actors,  and  suggests   titles  closest  to  the   query.  As  a  result,  85%   of  users  have  found  the   movie  they  were  looking   for  ranked  at  the  #1  spot   in  the  results.         • Discovery   Search  application  functionality  should  extend  beyond  the  generic  presentation  of  a  result   list  of  documents  that  contain  a  keyword.  Highlighting  keywords  in  searched  results,   expanding  searches  with  synonyms  and  spell  checking,  and  offering  users  ways  to  learn  a   bit  more  about  documents  in  the  results  without  having  to  load  the  document  are  great   ways  to  significantly  improve  usability.       • Intuitive  Intelligence   Search  applications  must  go  beyond  keyword  search  to  help  users  retrieve  accurate   information  even  when  they  are  not  sure  of  the  best  keywords.  Additionally,  they  should   reduce  misinterpretations  where  homonyms,  spelling  errors,  and  ambiguous  keywords  are   involved  (e.g.,  is  “apple”  a  fruit  or  a  computer  company?).   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 6
  10. 10.                                                   The Real World: Applications and Case Studies With  an  understanding  of  the  fundamentals  of  search  business  applications  in  hand,  it  is   helpful  to  gain  additional  context  on  business  usage  through  a  survey  of  organizations  that   have  successfully  used  Lucene/Solr  for  powerful  search  applications.     All  of  these  cases  were  built  on  the  capability  of  Lucene/Solr  to  provide  innovative,  high-­‐ performance,  cross-­‐platform,  feature-­‐rich  search  technology  suitable  for  nearly  every   application.  By  powering  diverse  search  applications  for  thousands  of  organizations  such   as  AT&T,  Zappos,  McClatchy,  Smithsonian,  MTV  Networks,  LinkedIn,  MySpace,  Comcast,   Monster,  Netflix,  and  many  more,  Lucene/Solr  has  provided  mission  critical  capability  that   turns  search  into  a  robust  competitive  advantage.     For  these  organizations,  Lucene/Solr  solutions  regularly  index  and  search  hundreds  of   millions  of  documents  with  subsecond  response  time,  unencumbered  by  costly  licensing  or   vendor  lock-­‐in.  Together  they  represent  a  compelling  argument  for  the  broad  applicability   of  Lucene/Solr  across  the  full  range  of  business  opportunities  and  search  needs.  Business   use  case  studies  we’ll  review  include:   • Yellow  Pages,  Local  Search,  and  Searching  Classifieds   • Media   • E-­‐commerce     • Job  and  Career  Sites     • Libraries,  Archives,  and  Museums  (LAMs)  Search     • Social  Media  Search     • Enterprise  (Intranet)  Search     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 7
  11. 11.                                                   Yellow Pages, Local Search, and Searching Requirements     Classifieds In  the  business  of  online  local  search,  geographic-­‐based  (location)   • Intelligent  results  going   beyond  keyword  search   relevance  generates  competitive  advantage.  Online  directories   need  to  provide  a  rich,  interactive  search  experience  to  users  to   • Deeper,  faceted   increase  site  views  and  stickiness,  which  in  turn  translates  into   navigation   increased  advertising  revenue.  Simplified  location-­‐based  search,   • Seamless  integration   with  latest  Web  2.0   intuitive  faceted  query  response,  and  data  mashups  are  a  few   features  that  define  search  functionality  for  an  online  directory.   tools   • Lower  IT-­‐related  costs   Lucene/Solr  solutions  offer  accurate  search  results,  factoring  in   • Geocentric  user   location,  users’  reviews,  and  ratings,  alongside  paid  advertising.  By   experience   taking  advantage  of  Solr’s  open  source  model—with  search   • Search  numeric  values   algorithms  that  are  completely  transparent—companies  can  invest     in  configuring  their  search  solutions  to  match  their  business  logic,   Solr  Solution   rather  than  trying  to  infer  or  pay  for  exposure  proprietary  back-­‐ end  logic.     • Customizable  Search   Index  which  can  be     tuned  transparently  to     Internet  Yellow  pages  and  local   account  for  key     online  search  is  forecast  to   findability  drivers   • Drop  down  filters  for   grow  to  $27.8  billion  in  2011.     narrowing  or  widening     The  Kelsey  Report1   the  scope  of  search   • Seamless  integration   Success  Stories   with  existing   technologies   • YP.com,  a  division  of  AT&T  Interactive   • Native  numeric   • Zvents.com,  local  event  search  service     encoding  and  search   • Yelp.com,  the  community  local  search  site   capabilities     M • Reduced  server     footprint  for  lower  TCO     than  most  commercial     vendors         1The  Kelsey  Group’s  Global  Print  Yellow  Pages,  Internet  Yellow  Pages  and  Local  Search  Five     Year  Outlook   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 8
  12. 12.                                                             Case  Study  1     yp.com  by  AT&T  Interactive       AT&T  Interactive  is  an  online  and  mobile  search  and  advertising  company.  Their  leading-­‐edge  portal,  yp.com—an     online  business  listing  and  advertising  site—was  originally  implemented  with  a  commercial  proprietary  search     application.  It  faced  issues  of  scalability,  vendor  lock-­‐in,  and  performance.  With  help  from  Lucid  Imagination,  AT&T   successfully  migrated  to  a  Solr-­‐based  search  solution  that  leveraged  the  flexibility  of  open  source  without   compromising  features  and  functionality.    And  they  did  so  with  a  much  smaller  budget.     Business  Needs   • Addressing  the  need  to  factor  in  location  to  support  geographic  search,  and  include  relevant  comments   • Striking  a  balance  between  organic  search  and  advertised  content   • Indexing  highly  unstructured  content  such  as  user  comments     • Increasing  relevancy  of  results  and  boosting  paid  search  results  for  preferential  placement  of  advertisers   • Linguistic  support  to  enable  search  experience,  such  as  spellchecking,  synonyms,  find-­‐similar,  etc.   • Integrating  with  latest  Web  2.0  tools   • Reducing  server  footprint     The  Solr  Solution     • Context-­‐specific  relevancy,  geographic  proximity,  ad  placement,  and  user  comments   • Faceting,  drop  down  filters  to  narrow/widen  the  scope  of  search     • Functional  support  for  creating  new  features     • Spell-­‐correction,  and  location-­‐optimized  search  results  to  show  users  businesses  nearest  to  them  first   • Seamless  integration  with  many  Web  2.0  tools  to  create  innovative  features  and  mashups   • Lowers  TCO  by  reducing  the  number  of  search  servers  from  120  to  two  dozen  servers     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 9
  13. 13.                                                     Media Brand  reinforcement,  premium  content,  and  easy  accessibility   are  the  main  business  motivators  for  online  media  and   Requirements   publishing  companies.  Relevant  information  improves  time  on   • Real-­‐time  indexing  of   the  site  and  encourages  users  to  explore  related  content,   petabytes  of  structured   boosting  subscription  rates  and  site  views.  These  translate  into  a   and  unstructured  data     virtuous  cycle  of  additional  revenue  generation.   • Deeper  search  capability   • Improved  query   Given  that  content  is  the  business,  the  need  for  a  robust  search   response  time   application  ties  directly  to  competitive  advantage.     • Reduced    infrastructure   Lucene/Solr  provides  a  customized,  function  rich  solution  for  the   and  customization  costs   media  and  publishing  industry.  It  addresses  dynamic  challenges     of  content  diversity,  content  freshness,  and  content  acquisition  ,   Solr  Solution   and  gives  companies  a  platform  on  which    to  build  a  world-­‐class   • Reverse  indexing   innovative  search  experience  to  differentiate  themselves  in  a   • Intelligent,  faceted  search   highly  competitive  marketplace.     to  enable  contextual  and   linguistic  relevance     • Easy  configuration  for     “Solr  has  done  wonders  for  us.   parsing  structured  and     It  is  easy  to  understand  and   unstructured  data   deploy,  and  has  reduced  our   • Easy  and  seamless     installation  for  lower   costs  drastically.”   TCO       Doug  Steigerwald,   • Customization  with  open   source  code      McClatchy  Interactive           Success  Stories   • McClatchy  Newspapers   • Netflix     • Comcast  Interactive   • MTV  Networks,  a  division  of  Viacom   M • The  Motley  Fool,  fool.com     • Fanfeedr.com,  personalized  sports  aggregator     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 10
  14. 14.                                                       Case  Study  2     McClatchy—Leading  Newspaper  Publisher   The  third  largest  newspaper  publisher  in  the  United  States,  McClatchy  Company  owns  30  daily   newspapers  in  29  markets  across  the  country.  To  win  online,  McClatchy  knew  it  had  to  have  a  robust   search  solution,  to  empower  the  McClatchy  audience  with  the  information  they  wanted  and  secure   loyalty  from  readers  and  sponsorships  from  advertisers.  Working  with  Lucid  Imagination,  McClatchy   migrated  from  proprietary  search  software  to  open  source  and  chose  Solr  for  its  high  performance,   comprehensive  capabilities,  and  superior  value     Requirements   • Proliferating  content  and  data  sources  (text,  videos,  audios,  images),  with  real-­‐time   streaming     • Empowering  end  users  with  ease  of  use   • Supporting  peak  traffic  and  popular  search  spikes  with  consistent  performance   • Providing  scalability  for  a  database  growing  by  orders  of  magnitude  annually   • Providing  flexibility  to  support  customization   • Controlling  IT  costs  while  exceeding  performance  benchmarks  of  competition     The  Lucene/Solr  Solution     • Deeper  content  by  indexing  both  structured  and  unstructured  data  in  real  time,  effortlessly   • Indexes  millions  of  documents,  with  search  results  delivered  in  milliseconds     • User-­‐friendly  navigation  with  drop  down  filters,  faceted  navigation,  linguistic  corrections,   etc.       • Excellent  performance,  even  in  peak  hours,  by  load-­‐balancing  search  requests  across  servers     • Scalability  without  impact  on  performance     • High  degree  of  customization,  since  it’s  open  source   • Integration  with  existing  IT  infrastructure  and  eliminates  associated  license  fees  to  cut  costs   • 8-­‐fold  reduction  in  server  footprint     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 11
  15. 15.                                                   E-commerce     E-­‐commerce  businesses  must  provide  a  compelling  shopping  experience   Requirements   in  order  to  maintain  brand  equity  and  thrive  in  a  very  highly  competitive   • Multidimensional,   market  landscape.  By  reducing  the  time  and  effort  required  to  navigate   dynamic  search   available  merchandise  and  find  what  they  want,  superior  search   • Faster  results   contributes  directly  to  a  satisfying  buying  experience  for  customers.   • Real-­‐time  indexing   Search  then  translates  directly  into  higher  revenues  and  customer   of  products   loyalty.  Instant  results,  intuitively  organized,  advanced  faceting  for  easy   • Faceting  and   browsing,  synchronizing  results  with  images,  and  integration  with  user   browsing   ratings  are  among  the  must  have  features  of  an  e-­‐commerce  search   capabilities   application.   • Seamless   Lucene/Solr  gives  companies  the  ability  to  build  their  sites  around  the   integration  with   concept  of  “searchendizing”—putting  the  desired  merchandise  at  the  top   existing  IT   of  the  results  list—which  can  make  the  difference  between  sales  made   infrastructure   and  sales  lost.  Faceting,  database  integration,  real-­‐time  indexing,  and     query  monitoring  all  enable  users  to  find  products  they  want,  driving   Solr  Solution   conversion  rates  and  enabling  a  winning  online  experience.  2     • Faceted  search  for     deeper  drill  down     Online  retail  sales  in  the   and  browsing     B2C  market  are  expected   • Intuitive  search     capabilities  for   Success  Stories   to  reach  $340  billion  by   cross-­‐channel   201321   shopping   • Buy.com   • Sears.com     experience     Forrester  Research   • System   • Macys.com   administration  tools   • Zappos.com   for  data  loading,   • Advanceautoparts.com   index  replication,   • Dollardays.com   monitoring,  logging,                                                                                                                   and  cache   management     • Query  monitoring   2  “Consumers  will  spend  more  than  $340  billion  online  by  2013,  says  Forrester,”   for  better    Internet  Retailer,  27  November  2009,  http://www.internetretailer.com/dailyNews.asp?id=32630.   highlighting  of   popular  products       The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 12
  16. 16.                                                                     Case  Study  3   Zappos   Zappos  is  the  premier  destination  for  online  shoe  shopping.  At  Zappos,  the  mission  is  excellent  online  customer   service—customers  should  be  able  to  browse  shoe  styles,  sizes,  shapes,  and  colors  more  easily  than  any  other  shoe   store,  on  or  offline.  To  achieve  this,  Zappos  wanted  a  robust,  flexible,  multifunctional  search  solution/application.   After  evaluating  many  commercial  search  technologies,  Zappos  zeroed  in  on  Solr,  working  with  Lucid  Imagination  to   ensure  continued,  successful  deployment.   Requirements   • Simplified,  attractive  user  experience  that  makes  it  easy  to  find  and  buy   • Relevant  results,  fast   • Navigation  across  attributes,  such  as  size,  color,  and  style  for  broader  and  deeper  results   • Indexing  products  as  they  were  entered  in  the  catalogs   • Cross-­‐functional  navigation  to  give  customers  a  realistic  shopping  experience   • Intuitive  intelligence  to  provide  alternate  suggestions   • Analytical  capabilities  to  drive  business  strategy   • Facilitating  control  on  results   • Integration  with  existing  IT  infrastructure     The  Solr  Solution   • Search  results  in  subseconds,  across  categories   • Faceting,  for  easy  browsing  and  discovery  and  a  compelling  user  experience     • Real-­‐time  indexing  of  products   • Synchronization  of  visuals,  specs,  filters,  and  promotions  to  make  shopping  experience  true  to  life   • Information  on  user  activity  to  help  build  strategy  on  product  promotions   • Controls  to  rank    popular  or  high-­‐stock  products  in  results    where  users  are  more  likely  to  buy  them   • Facilitates  integration  with  heterogeneous  open  source  environment   The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 13
  17. 17.                                                       Job and Career Sites Requirements     • Linguistic   Job  portals  are  countercyclical  to  the  economy.  When  the  economy   intelligence  for   flourishes,  posted  jobs  grow  in  number;  when  it  sags,  candidates  flock  in   more  relevant   to  post  their  resumes.  Success  for  an  online  job  portal  is  tied  to  the   results   efficiency  of  its  search  capability—matching  résumés  to  job  listings  and   • Control  search   vice  versa—so  both  employers  and  prospective  employees  can  zero  in   results  to  maintain   on  just  the  right  opportunity.   privacy   For  example,  an  employer  may  want  to  navigate  through  filters  to   • Deeper  search   narrow  the  scope  of  a  candidate  search,  such  as  education,  previous   capability   employer,  salary  history,  skillsets,  etc.;  a  job  seeker  may  want  to  expose   • Numeric  search   these  attributes,  but  keep  a  current  employer’s  name  confidential.  A  job-­‐ • Faster  query   seeker  may  want  to  apply  to  jobs  within  a  particular  geographic  area.   response   • Reduced   Lucene/Solr  not  only  provides  such  flexibility  but  also  addresses  other   infrastructure  and   complexities  of  this  industry  by  enabling  linguistic  intelligence  (such  as   customization  costs   identical  acronyms  that  correspond  to  different  entities;  variations  in     spelling,  imperfectly  constructed  search  queries);  indexing  unstructured   Solr  Solution   data  (résumés);  and  managing  ever-­‐growing  data.   • Intelligent,  faceted     search  to  enable   contextual  and     “I  think  the  breakthrough  was   linguistic  relevance     when  we  tried  it,  and  we   • Easy  configuration   realized,  wow,  this  thing  could   for  parsing     structured  and   really  scale.”   unstructured  data       • Easy  and  seamless     Peter  Keegan,  Monster.com   installation  for     Success  Stories   lower  TCO   • Business  process   • Monster   integration  and   • The  Big  Jobs   Customization  with   • eBharatJobs   open  source  code     • Careerjet       M The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 14  
  18. 18.                                                         Monster.com   Monster  is  the  largest  job  search  engine  in  the  world,  with  over  a  million  jobs  posted  at  any  one  time.  By  2008  it  had   150  million  résumés  in  its  database,  serving  over  63  million  job  seekers  per  month,  now  running  on  average  300  to   400  queries  per  second  with  an  average  response  time  of  40  milliseconds.  To  provide  the  highest  level  of  service   and  support  to  their  customers—both  employers  and  job  seekers—Monster  has  an  unmatched  marketplace  for   employment  opportunities,  with  Lucene-­‐based  search  at  the  heart  of  its  business  model.     The  Requirements     • Managing  high  volumes  of  data,  continually  increasing  by  double  digit  percentages  annually   • Maintaining  constant  inventory  updates  and  providing  faster  results   • Removing  technological  barriers  that  limit  the  scope  of  information   • Enabling  end  users  to  refine  search  and  drill  deeper  without  any  performance  impact   • Providing  security  controls  to  ensure  end  user  privacy   • Facilitating  scalability  and  flexibility  in  tandem  with  company’s  vision  and  growth  plans     The  Lucene  Solution     • High  volumes  of  data  by  clustering  data  to  reduce  the  index  size     • Real-­‐time  indexing  for  fresher,  faster  query  results     • Intuitive  search  to  enable  in-­‐depth  cross-­‐functional  job  and  résumé  browsing   • Faceted  search  and  ‘single  click’  filters  for  search  refinement     • Security  controls  to  manage  user  information   • Unlimited  scalability  and  customization  leveraging  open  source  licensing     The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 15

×