0
Larry Cannell
Senior Analyst
                           Is Enterprise Search Ripe for Open
Burton Group               Sour...
Open Source Search                             2




Agenda
 • Why Open Source and Search?
 • Enterprise Opportunities to ...
Open Source Search                             3




Agenda
 • Why Open Source and Search?
 • Enterprise Opportunities to ...
Can You Tell the Difference?   4
Can You Tell the Difference? Netflix   5
Can You Tell the Difference? CNET   6
Can You Tell the Difference? Best Buy   7
Can You Tell the Difference? Wikipedia   8
Can You Tell the Difference? Monster   9
Which Site Uses Open Source Search?   10




              Netflix

                            CNET


  Best Buy

       ...
Which Site Uses Open Source Search?   11




                 Best Buy




                            Monster
Why Open Source and Search?   12




Lucene and Solr Gets Funded
Open Source Search                             13




Agenda
 • Why Open Source and Search?
 • Enterprise Opportunities to...
Enterprise Opportunities   14




Basic Website/
Intranet Search
Enterprise Opportunities              15




Basic Website/             Vertical Search
Intranet Search
Enterprise Opportunities              16




    Basic Website/                Vertical Search
    Intranet Search



No c...
Enterprise Opportunities                      17




    Basic Website/                 Vertical Search
    Intranet Searc...
Open Source Search                             18




Agenda
 • Why Open Source and Search?
 • Enterprise Opportunities to...
Numerous Options                       19




• Beagle           • Minion        • Sphinx
• DataparkSearch   • Mnogosearch...
Honorable Mention   20
The Short List   21
The Short List   22
Lucene Family Tree   23




          Lucene


Lucene
Ports
Lucene Family Tree                     24




          Lucene
                (2000)




                         (2002) ...
Lucene Family Tree                     25




          Lucene
                (2000)




                         (2002) ...
26
            User
          Interface


           Search
           Engine

             Search     Administration
    ...
27
             User
Lucene     Interface


            Search
            Engine

              Search     Administration...
28
             User
Solr       Interface


            Search
            Engine

              Search     Administration...
Solr’s Potential to Disrupt                                 29




The MySQL of search servers?
 • Search server based on ...
Solr’s Potential to Disrupt                                 30




The MySQL of search servers?
 • Search server based on ...
Open Source Search
References
• Burton Group’s Collaboration and Content Strategies
   • Open Source Search: Bringing Ente...
Open	
  Source	
  Search

Brian	
  Pinkerton
Chief	
  Architect



                     1
Why	
  Open	
  Source	
  for	
  Search?
       Large	
  scale:	
  billions	
  of	
  documents;	
  hundreds	
  of	
  cluste...
Example:	
  Searching	
  Social	
  Media
     Everyone	
  collaborates	
  with	
  everyone	
  
     on	
  everything	
  ev...
Search	
  is	
  Essen;al
       Too	
  much	
  content	
  to	
  navigate	
  without	
  filtering
       Some8mes,	
  only	
...
How	
  is	
  Social	
  Media	
  Transforming	
  Search?
            20th	
  Century                         Web	
  1.0    ...
Examples	
  of	
  Searching	
  Social	
  Media

 Pioneer	
  in	
  blog	
  searching:	
  Technora8                         ...
Technora;:	
  Blog	
  Search
Technora;	
  is	
  a	
  blog-­‐discovery	
  engine
       300,000	
  new	
  posts	
  per	
  d...
Scout	
  Labs:	
  Analyzing	
  the	
  Interac;on
Scout	
  Labs	
  is	
  a	
  social-­‐media	
  
monitoring	
  tool
       ...
digg:	
  BoMom-­‐up	
  Relevance
Digg	
  shows	
  user-­‐submiMed	
  links	
  in	
  
real	
  ;me
        Users	
  vote	
  ...
LinkedIn:	
  People	
  are	
  the	
  Content
LinkedIn	
  is	
  a	
  business	
  social	
  network
        50	
  million	
 ...
Yelp:	
  People	
  and	
  Places
Yelp	
  facilitates	
  user	
  reviews
        Searches	
  business	
  meta-­‐data	
  plu...
Xmarks:	
  PaMerns	
  from	
  the	
  People
Xmarks	
  provides	
  bookmark	
  sync	
  and	
  
Web	
  discovery
       Firs...
MySpace:	
  Searching	
  it	
  all
MySpace	
  does	
  it	
  all:
        Many	
  content	
  types	
  from	
  all	
  over	
...
Social	
  Media	
  is	
  Pushing	
  Search	
  In	
  New	
  Direc;ons
       Searches	
  the	
  product	
  of	
  interac8on...
Upcoming SlideShare
Loading in...5
×

Is Enterprise Search Ripe for Open Source Disruption?

824

Published on

Presenation given by Larry Cannell, Senior Analyst of Burton Group and Brian Pinkerton, Chief Architect of Lucid Imagination at Enterprise 2.0 San Francisco 2009.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
824
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Is Enterprise Search Ripe for Open Source Disruption?"

  1. 1. Larry Cannell Senior Analyst Is Enterprise Search Ripe for Open Burton Group Source Disruption? lcannell@burtongroup.com www.burtongroup.com Brian Pinkerton Chief Architect Lucid Imagination www.lucidimagination.com
  2. 2. Open Source Search 2 Agenda • Why Open Source and Search? • Enterprise Opportunities to Use Open Source Search • Market Analysis • Lucid Imagination
  3. 3. Open Source Search 3 Agenda • Why Open Source and Search? • Enterprise Opportunities to Use Open Source Search • Market Analysis • Lucid Imagination
  4. 4. Can You Tell the Difference? 4
  5. 5. Can You Tell the Difference? Netflix 5
  6. 6. Can You Tell the Difference? CNET 6
  7. 7. Can You Tell the Difference? Best Buy 7
  8. 8. Can You Tell the Difference? Wikipedia 8
  9. 9. Can You Tell the Difference? Monster 9
  10. 10. Which Site Uses Open Source Search? 10 Netflix CNET Best Buy Wikipedia Monster
  11. 11. Which Site Uses Open Source Search? 11 Best Buy Monster
  12. 12. Why Open Source and Search? 12 Lucene and Solr Gets Funded
  13. 13. Open Source Search 13 Agenda • Why Open Source and Search? • Enterprise Opportunities to Use Open Source Search • Market Analysis • Lucid Imagination
  14. 14. Enterprise Opportunities 14 Basic Website/ Intranet Search
  15. 15. Enterprise Opportunities 15 Basic Website/ Vertical Search Intranet Search
  16. 16. Enterprise Opportunities 16 Basic Website/ Vertical Search Intranet Search No compelling reason to use open source Only consider if you have more headcount than budget
  17. 17. Enterprise Opportunities 17 Basic Website/ Vertical Search Intranet Search No compelling reason to use open source Only consider if you have Best opportunities for open more headcount than source search budget
  18. 18. Open Source Search 18 Agenda • Why Open Source and Search? • Enterprise Opportunities to Use Open Source Search • Market Analysis • Lucid Imagination
  19. 19. Numerous Options 19 • Beagle • Minion • Sphinx • DataparkSearch • Mnogosearch • Swish-e • egothor • Namazu • Swish ++ • Htdig • OpenFTS • Terrier • Hounder • regain • Wumpus • Lemur • Red Piranha • Zettair • MG4J • Simplexo
  20. 20. Honorable Mention 20
  21. 21. The Short List 21
  22. 22. The Short List 22
  23. 23. Lucene Family Tree 23 Lucene Lucene Ports
  24. 24. Lucene Family Tree 24 Lucene (2000) (2002) (2005) Lucene Nutch Hadoop Ports
  25. 25. Lucene Family Tree 25 Lucene (2000) (2002) (2005) Lucene Nutch Hadoop Ports (2005) Solr
  26. 26. 26 User Interface Search Engine Search Administration Repository Content Content Ingestion Set
  27. 27. 27 User Lucene Interface Search Engine Search Administration Repository Content Content Ingestion Set
  28. 28. 28 User Solr Interface Search Engine Search Administration Repository Content Content Ingestion Set
  29. 29. Solr’s Potential to Disrupt 29 The MySQL of search servers? • Search server based on Lucene • Easy initial setup • Web services-like interface (XML over HTTP) • Support for non-Java clients • Caching, performance tuning, high-availability, load balancing • Faceted browsing, similar documents
  30. 30. Solr’s Potential to Disrupt 30 The MySQL of search servers? • Search server based on Lucene • Easy initial setup • Web services-like interface (XML over HTTP) • Support for non-Java clients • Caching, performance tuning, high-availability, load balancing • Faceted browsing, similar documents • Commoditizes vertical search • Could have similar impact on application development as ODBC/JDBC • Consider the 1000s of applications enabled by ODBC/JDBC • Vertical search can now be applied to almost any application
  31. 31. Open Source Search References • Burton Group’s Collaboration and Content Strategies • Open Source Search: Bringing Enterprise Search Out into the Open • Enterprise Information Search: Transforming Search into an Insight Engine (January 2010) • A Complex Query: What’s the Right Enterprise Search Engine? • Open Source Communication, Collaboration, and Content Management: Cutting-Edge Innovation, Low-Cost Imitation, or Both?
  32. 32. Open  Source  Search Brian  Pinkerton Chief  Architect 1
  33. 33. Why  Open  Source  for  Search? Large  scale:  billions  of  documents;  hundreds  of  cluster  nodes Uses  modern  architectures  to  achieve  massive  scalability Some  of  the  biggest  search  indexes  are  on  open  source  soFware High  Performance Fast  response  8me Flexible  relevance Use  built-­‐in  relevance  (on  par  with  others)  or  augment Stand-­‐alone,  integrated,  or  embedded Mature,  yet  not  stuck  in  8me Con8nued  momentum  on  all  facets  of  the  products Great  support  from  the  community Lucid  Imagina8on,  Inc. 2
  34. 34. Example:  Searching  Social  Media Everyone  collaborates  with  everyone   on  everything  everywhere You’ve  heard  the  hype Much  is  probably  just  that But  it’s  changing  Web  habits And  it’s  pushing  the  state  of  the  art  in   search Enterprise  adop8on  is  trailing  the   wide  Web,  but  it’s  coming Will  you  be  ready? Lucid  Imagina8on,  Inc. 3
  35. 35. Search  is  Essen;al Too  much  content  to  navigate  without  filtering Some8mes,  only  analy8cs  can  do  the  job Other  8mes,  users  expect  to  search,  not  navigate Used  for  surfacing  more  than  just  plain  old  search  results   Lucid  Imagina8on,  Inc. 4
  36. 36. How  is  Social  Media  Transforming  Search? 20th  Century Web  1.0 Web  2.0 Business-­‐generated  content Power-­‐user  content;  HTML  only User-­‐generated  content Searches  the  aributes Searches  the  content Both,  plus  the  interac(on Normalized  data  model Flat  data  model Ad  hoc  normaliza8on Transac8onal  models Batch  processing Powered  by  now Batch  analy8cs Few  analy8cs User-­‐driven  analysis Lucid  Imagina8on,  Inc. 5
  37. 37. Examples  of  Searching  Social  Media Pioneer  in  blog  searching:  Technora8 Lucene  →  Solr Analyizing  the  Interac8on:  Scout  Labs Lucene Boom-­‐up  relevance:  digg Solr People  are  the  content:  LinkedIn Lucene People  and  places:  Yelp Lucene Paerns  from  the  people:  Xmarks Lucene Searching  the  Social  Universe:  MySpace Lucene.NET Lucid  Imagina8on,  Inc. 6
  38. 38. Technora;:  Blog  Search Technora;  is  a  blog-­‐discovery  engine 300,000  new  posts  per  day Surge  of  posts  in  the  morning Separate  indexes  for  blog  and  post   data Noisy,  user-­‐generated  content Search  used  behind  the  scenes  to   build  the  user  interface New  index  keeps  only  a  limited  8me   available Lucid  Imagina8on,  Inc. 7
  39. 39. Scout  Labs:  Analyzing  the  Interac;on Scout  Labs  is  a  social-­‐media   monitoring  tool Mines  the  stream  of  interac8on   across  many  forms  of  social  media:   blogs,  comments,  tweets,  forums,   mailing  lists The  interac8on  can  be  messy,  so   Scout  Labs  provides  summaries Analy8cs  provide  comparisons Sen8ment  summarizes  adtudes   Because  of  the  analy8cs,  must  keep   more  data  online  -­‐  this  can  get   expensive Lucid  Imagina8on,  Inc. 8
  40. 40. digg:  BoMom-­‐up  Relevance Digg  shows  user-­‐submiMed  links  in   real  ;me Users  vote  up  or  down  on   submissions Content  is  indexed  in  near-­‐real  8me Results  are  scored  by  a  combina8on   of  factors  (recency,  number  of  diggs,   etc.) Lucid  Imagina8on,  Inc. 9
  41. 41. LinkedIn:  People  are  the  Content LinkedIn  is  a  business  social  network 50  million  members Faceted  search facets  on  loca8on,  industries,   companies,    rela8onship,  etc. not  all  are  easy  to  implement Sor8ng  by  relevance  +  rela8onship requires  significant  query-­‐8me  work Lucid  Imagina8on,  Inc. 10
  42. 42. Yelp:  People  and  Places Yelp  facilitates  user  reviews Searches  business  meta-­‐data  plus   review  content Heavy  geographic  component Results  are  structured  by   establishment,  but  searchable  by   review Lucid  Imagina8on,  Inc. 11
  43. 43. Xmarks:  PaMerns  from  the  People Xmarks  provides  bookmark  sync  and   Web  discovery First  provided  bookmark  sync;   adopted  by  millions  of  users Aggregates  bookmark  folder  structure   and  meta-­‐data  by  URL This  descrip8ve  content  is  mined  to   provide  a  searchable  index Needed  new  ranking  algorithms  to   provide  good  relevance  and  filter  out   the  noise Lucid  Imagina8on,  Inc. 12
  44. 44. MySpace:  Searching  it  all MySpace  does  it  all: Many  content  types  from  all  over  the  site User  generated  content  +  user  interac8ons Near  Real  Time New  content  and  users  arriving  24x7 Both  end-­‐user  and  administra8ve  func8ons admin  func8ons  include  log  file  searching automated  tasks  help  iden8fy  spam,  other   problems Massive  scale:  billions  of  records,  petabytes   of  source  data new  content  at  the  rate  of  1TB  every  week Lucid  Imagina8on,  Inc. 13
  45. 45. Social  Media  is  Pushing  Search  In  New  Direc;ons Searches  the  product  of  interac8on  among  users,  not  just  content Aggregates  data  from  mul8ple  sources  at  search  8me Operates  in  real-­‐8me,  as  data  is  produced Extends  the  tradi8onal  no8ons  of  relevance Builds  analy8cs  on  top  of  search and...  you  can  build  all  of  this  on  open  source  products! Lucid  Imagina8on,  Inc. 14
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×