• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Is Enterprise Search Ripe for Open Source Disruption?
 

Is Enterprise Search Ripe for Open Source Disruption?

on

  • 1,020 views

Presenation given by Larry Cannell, Senior Analyst of Burton Group and Brian Pinkerton, Chief Architect of Lucid Imagination at Enterprise 2.0 San Francisco 2009.

Presenation given by Larry Cannell, Senior Analyst of Burton Group and Brian Pinkerton, Chief Architect of Lucid Imagination at Enterprise 2.0 San Francisco 2009.

Statistics

Views

Total Views
1,020
Views on SlideShare
1,020
Embed Views
0

Actions

Likes
0
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Is Enterprise Search Ripe for Open Source Disruption? Is Enterprise Search Ripe for Open Source Disruption? Presentation Transcript

    • Larry Cannell Senior Analyst Is Enterprise Search Ripe for Open Burton Group Source Disruption? lcannell@burtongroup.com www.burtongroup.com Brian Pinkerton Chief Architect Lucid Imagination www.lucidimagination.com
    • Open Source Search 2 Agenda • Why Open Source and Search? • Enterprise Opportunities to Use Open Source Search • Market Analysis • Lucid Imagination
    • Open Source Search 3 Agenda • Why Open Source and Search? • Enterprise Opportunities to Use Open Source Search • Market Analysis • Lucid Imagination
    • Can You Tell the Difference? 4
    • Can You Tell the Difference? Netflix 5
    • Can You Tell the Difference? CNET 6
    • Can You Tell the Difference? Best Buy 7
    • Can You Tell the Difference? Wikipedia 8
    • Can You Tell the Difference? Monster 9
    • Which Site Uses Open Source Search? 10 Netflix CNET Best Buy Wikipedia Monster
    • Which Site Uses Open Source Search? 11 Best Buy Monster
    • Why Open Source and Search? 12 Lucene and Solr Gets Funded
    • Open Source Search 13 Agenda • Why Open Source and Search? • Enterprise Opportunities to Use Open Source Search • Market Analysis • Lucid Imagination
    • Enterprise Opportunities 14 Basic Website/ Intranet Search
    • Enterprise Opportunities 15 Basic Website/ Vertical Search Intranet Search
    • Enterprise Opportunities 16 Basic Website/ Vertical Search Intranet Search No compelling reason to use open source Only consider if you have more headcount than budget
    • Enterprise Opportunities 17 Basic Website/ Vertical Search Intranet Search No compelling reason to use open source Only consider if you have Best opportunities for open more headcount than source search budget
    • Open Source Search 18 Agenda • Why Open Source and Search? • Enterprise Opportunities to Use Open Source Search • Market Analysis • Lucid Imagination
    • Numerous Options 19 • Beagle • Minion • Sphinx • DataparkSearch • Mnogosearch • Swish-e • egothor • Namazu • Swish ++ • Htdig • OpenFTS • Terrier • Hounder • regain • Wumpus • Lemur • Red Piranha • Zettair • MG4J • Simplexo
    • Honorable Mention 20
    • The Short List 21
    • The Short List 22
    • Lucene Family Tree 23 Lucene Lucene Ports
    • Lucene Family Tree 24 Lucene (2000) (2002) (2005) Lucene Nutch Hadoop Ports
    • Lucene Family Tree 25 Lucene (2000) (2002) (2005) Lucene Nutch Hadoop Ports (2005) Solr
    • 26 User Interface Search Engine Search Administration Repository Content Content Ingestion Set
    • 27 User Lucene Interface Search Engine Search Administration Repository Content Content Ingestion Set
    • 28 User Solr Interface Search Engine Search Administration Repository Content Content Ingestion Set
    • Solr’s Potential to Disrupt 29 The MySQL of search servers? • Search server based on Lucene • Easy initial setup • Web services-like interface (XML over HTTP) • Support for non-Java clients • Caching, performance tuning, high-availability, load balancing • Faceted browsing, similar documents
    • Solr’s Potential to Disrupt 30 The MySQL of search servers? • Search server based on Lucene • Easy initial setup • Web services-like interface (XML over HTTP) • Support for non-Java clients • Caching, performance tuning, high-availability, load balancing • Faceted browsing, similar documents • Commoditizes vertical search • Could have similar impact on application development as ODBC/JDBC • Consider the 1000s of applications enabled by ODBC/JDBC • Vertical search can now be applied to almost any application
    • Open Source Search References • Burton Group’s Collaboration and Content Strategies • Open Source Search: Bringing Enterprise Search Out into the Open • Enterprise Information Search: Transforming Search into an Insight Engine (January 2010) • A Complex Query: What’s the Right Enterprise Search Engine? • Open Source Communication, Collaboration, and Content Management: Cutting-Edge Innovation, Low-Cost Imitation, or Both?
    • Open  Source  Search Brian  Pinkerton Chief  Architect 1
    • Why  Open  Source  for  Search? Large  scale:  billions  of  documents;  hundreds  of  cluster  nodes Uses  modern  architectures  to  achieve  massive  scalability Some  of  the  biggest  search  indexes  are  on  open  source  soFware High  Performance Fast  response  8me Flexible  relevance Use  built-­‐in  relevance  (on  par  with  others)  or  augment Stand-­‐alone,  integrated,  or  embedded Mature,  yet  not  stuck  in  8me Con8nued  momentum  on  all  facets  of  the  products Great  support  from  the  community Lucid  Imagina8on,  Inc. 2
    • Example:  Searching  Social  Media Everyone  collaborates  with  everyone   on  everything  everywhere You’ve  heard  the  hype Much  is  probably  just  that But  it’s  changing  Web  habits And  it’s  pushing  the  state  of  the  art  in   search Enterprise  adop8on  is  trailing  the   wide  Web,  but  it’s  coming Will  you  be  ready? Lucid  Imagina8on,  Inc. 3
    • Search  is  Essen;al Too  much  content  to  navigate  without  filtering Some8mes,  only  analy8cs  can  do  the  job Other  8mes,  users  expect  to  search,  not  navigate Used  for  surfacing  more  than  just  plain  old  search  results   Lucid  Imagina8on,  Inc. 4
    • How  is  Social  Media  Transforming  Search? 20th  Century Web  1.0 Web  2.0 Business-­‐generated  content Power-­‐user  content;  HTML  only User-­‐generated  content Searches  the  aributes Searches  the  content Both,  plus  the  interac(on Normalized  data  model Flat  data  model Ad  hoc  normaliza8on Transac8onal  models Batch  processing Powered  by  now Batch  analy8cs Few  analy8cs User-­‐driven  analysis Lucid  Imagina8on,  Inc. 5
    • Examples  of  Searching  Social  Media Pioneer  in  blog  searching:  Technora8 Lucene  →  Solr Analyizing  the  Interac8on:  Scout  Labs Lucene Boom-­‐up  relevance:  digg Solr People  are  the  content:  LinkedIn Lucene People  and  places:  Yelp Lucene Paerns  from  the  people:  Xmarks Lucene Searching  the  Social  Universe:  MySpace Lucene.NET Lucid  Imagina8on,  Inc. 6
    • Technora;:  Blog  Search Technora;  is  a  blog-­‐discovery  engine 300,000  new  posts  per  day Surge  of  posts  in  the  morning Separate  indexes  for  blog  and  post   data Noisy,  user-­‐generated  content Search  used  behind  the  scenes  to   build  the  user  interface New  index  keeps  only  a  limited  8me   available Lucid  Imagina8on,  Inc. 7
    • Scout  Labs:  Analyzing  the  Interac;on Scout  Labs  is  a  social-­‐media   monitoring  tool Mines  the  stream  of  interac8on   across  many  forms  of  social  media:   blogs,  comments,  tweets,  forums,   mailing  lists The  interac8on  can  be  messy,  so   Scout  Labs  provides  summaries Analy8cs  provide  comparisons Sen8ment  summarizes  adtudes   Because  of  the  analy8cs,  must  keep   more  data  online  -­‐  this  can  get   expensive Lucid  Imagina8on,  Inc. 8
    • digg:  BoMom-­‐up  Relevance Digg  shows  user-­‐submiMed  links  in   real  ;me Users  vote  up  or  down  on   submissions Content  is  indexed  in  near-­‐real  8me Results  are  scored  by  a  combina8on   of  factors  (recency,  number  of  diggs,   etc.) Lucid  Imagina8on,  Inc. 9
    • LinkedIn:  People  are  the  Content LinkedIn  is  a  business  social  network 50  million  members Faceted  search facets  on  loca8on,  industries,   companies,    rela8onship,  etc. not  all  are  easy  to  implement Sor8ng  by  relevance  +  rela8onship requires  significant  query-­‐8me  work Lucid  Imagina8on,  Inc. 10
    • Yelp:  People  and  Places Yelp  facilitates  user  reviews Searches  business  meta-­‐data  plus   review  content Heavy  geographic  component Results  are  structured  by   establishment,  but  searchable  by   review Lucid  Imagina8on,  Inc. 11
    • Xmarks:  PaMerns  from  the  People Xmarks  provides  bookmark  sync  and   Web  discovery First  provided  bookmark  sync;   adopted  by  millions  of  users Aggregates  bookmark  folder  structure   and  meta-­‐data  by  URL This  descrip8ve  content  is  mined  to   provide  a  searchable  index Needed  new  ranking  algorithms  to   provide  good  relevance  and  filter  out   the  noise Lucid  Imagina8on,  Inc. 12
    • MySpace:  Searching  it  all MySpace  does  it  all: Many  content  types  from  all  over  the  site User  generated  content  +  user  interac8ons Near  Real  Time New  content  and  users  arriving  24x7 Both  end-­‐user  and  administra8ve  func8ons admin  func8ons  include  log  file  searching automated  tasks  help  iden8fy  spam,  other   problems Massive  scale:  billions  of  records,  petabytes   of  source  data new  content  at  the  rate  of  1TB  every  week Lucid  Imagina8on,  Inc. 13
    • Social  Media  is  Pushing  Search  In  New  Direc;ons Searches  the  product  of  interac8on  among  users,  not  just  content Aggregates  data  from  mul8ple  sources  at  search  8me Operates  in  real-­‐8me,  as  data  is  produced Extends  the  tradi8onal  no8ons  of  relevance Builds  analy8cs  on  top  of  search and...  you  can  build  all  of  this  on  open  source  products! Lucid  Imagina8on,  Inc. 14