Presenation given by Larry Cannell, Senior Analyst of Burton Group and Brian Pinkerton, Chief Architect of Lucid Imagination at Enterprise 2.0 San Francisco 2009.
Is Enterprise Search Ripe for Open Source Disruption?
1. Larry Cannell
Senior Analyst
Is Enterprise Search Ripe for Open
Burton Group Source Disruption?
lcannell@burtongroup.com
www.burtongroup.com
Brian Pinkerton
Chief Architect
Lucid Imagination
www.lucidimagination.com
2. Open Source Search 2
Agenda
• Why Open Source and Search?
• Enterprise Opportunities to Use Open Source Search
• Market Analysis
• Lucid Imagination
3. Open Source Search 3
Agenda
• Why Open Source and Search?
• Enterprise Opportunities to Use Open Source Search
• Market Analysis
• Lucid Imagination
13. Open Source Search 13
Agenda
• Why Open Source and Search?
• Enterprise Opportunities to Use Open Source Search
• Market Analysis
• Lucid Imagination
16. Enterprise Opportunities 16
Basic Website/ Vertical Search
Intranet Search
No compelling reason to
use open source
Only consider if you have
more headcount than
budget
17. Enterprise Opportunities 17
Basic Website/ Vertical Search
Intranet Search
No compelling reason to
use open source
Only consider if you have Best opportunities for open
more headcount than source search
budget
18. Open Source Search 18
Agenda
• Why Open Source and Search?
• Enterprise Opportunities to Use Open Source Search
• Market Analysis
• Lucid Imagination
24. Lucene Family Tree 24
Lucene
(2000)
(2002) (2005)
Lucene Nutch Hadoop
Ports
25. Lucene Family Tree 25
Lucene
(2000)
(2002) (2005)
Lucene Nutch Hadoop
Ports
(2005)
Solr
26. 26
User
Interface
Search
Engine
Search Administration
Repository
Content
Content Ingestion
Set
27. 27
User
Lucene Interface
Search
Engine
Search Administration
Repository
Content
Content Ingestion
Set
28. 28
User
Solr Interface
Search
Engine
Search Administration
Repository
Content
Content Ingestion
Set
29. Solr’s Potential to Disrupt 29
The MySQL of search servers?
• Search server based on Lucene
• Easy initial setup
• Web services-like interface (XML over HTTP)
• Support for non-Java clients
• Caching, performance tuning, high-availability, load balancing
• Faceted browsing, similar documents
30. Solr’s Potential to Disrupt 30
The MySQL of search servers?
• Search server based on Lucene
• Easy initial setup
• Web services-like interface (XML over HTTP)
• Support for non-Java clients
• Caching, performance tuning, high-availability, load balancing
• Faceted browsing, similar documents
• Commoditizes vertical search
• Could have similar impact on application development as
ODBC/JDBC
• Consider the 1000s of applications enabled by
ODBC/JDBC
• Vertical search can now be applied to almost any application
31. Open Source Search
References
• Burton Group’s Collaboration and Content Strategies
• Open Source Search: Bringing Enterprise Search Out into the Open
• Enterprise Information Search: Transforming Search into an Insight
Engine (January 2010)
• A Complex Query: What’s the Right Enterprise Search Engine?
• Open Source Communication, Collaboration, and Content
Management: Cutting-Edge Innovation, Low-Cost Imitation, or Both?
33. Why
Open
Source
for
Search?
Large
scale:
billions
of
documents;
hundreds
of
cluster
nodes
Uses
modern
architectures
to
achieve
massive
scalability
Some
of
the
biggest
search
indexes
are
on
open
source
soFware
High
Performance
Fast
response
8me
Flexible
relevance
Use
built-‐in
relevance
(on
par
with
others)
or
augment
Stand-‐alone,
integrated,
or
embedded
Mature,
yet
not
stuck
in
8me
Con8nued
momentum
on
all
facets
of
the
products
Great
support
from
the
community
Lucid
Imagina8on,
Inc. 2
34. Example:
Searching
Social
Media
Everyone
collaborates
with
everyone
on
everything
everywhere
You’ve
heard
the
hype
Much
is
probably
just
that
But
it’s
changing
Web
habits
And
it’s
pushing
the
state
of
the
art
in
search
Enterprise
adop8on
is
trailing
the
wide
Web,
but
it’s
coming
Will
you
be
ready?
Lucid
Imagina8on,
Inc. 3
35. Search
is
Essen;al
Too
much
content
to
navigate
without
filtering
Some8mes,
only
analy8cs
can
do
the
job
Other
8mes,
users
expect
to
search,
not
navigate
Used
for
surfacing
more
than
just
plain
old
search
results
Lucid
Imagina8on,
Inc. 4
36. How
is
Social
Media
Transforming
Search?
20th
Century Web
1.0 Web
2.0
Business-‐generated
content Power-‐user
content;
HTML
only User-‐generated
content
Searches
the
aributes Searches
the
content Both,
plus
the
interac(on
Normalized
data
model Flat
data
model Ad
hoc
normaliza8on
Transac8onal
models Batch
processing Powered
by
now
Batch
analy8cs Few
analy8cs User-‐driven
analysis
Lucid
Imagina8on,
Inc. 5
37. Examples
of
Searching
Social
Media
Pioneer
in
blog
searching:
Technora8 Lucene
→
Solr
Analyizing
the
Interac8on:
Scout
Labs Lucene
Boom-‐up
relevance:
digg Solr
People
are
the
content:
LinkedIn Lucene
People
and
places:
Yelp Lucene
Paerns
from
the
people:
Xmarks Lucene
Searching
the
Social
Universe:
MySpace Lucene.NET
Lucid
Imagina8on,
Inc. 6
38. Technora;:
Blog
Search
Technora;
is
a
blog-‐discovery
engine
300,000
new
posts
per
day
Surge
of
posts
in
the
morning
Separate
indexes
for
blog
and
post
data
Noisy,
user-‐generated
content
Search
used
behind
the
scenes
to
build
the
user
interface
New
index
keeps
only
a
limited
8me
available
Lucid
Imagina8on,
Inc. 7
39. Scout
Labs:
Analyzing
the
Interac;on
Scout
Labs
is
a
social-‐media
monitoring
tool
Mines
the
stream
of
interac8on
across
many
forms
of
social
media:
blogs,
comments,
tweets,
forums,
mailing
lists
The
interac8on
can
be
messy,
so
Scout
Labs
provides
summaries
Analy8cs
provide
comparisons
Sen8ment
summarizes
adtudes
Because
of
the
analy8cs,
must
keep
more
data
online
-‐
this
can
get
expensive
Lucid
Imagina8on,
Inc. 8
40. digg:
BoMom-‐up
Relevance
Digg
shows
user-‐submiMed
links
in
real
;me
Users
vote
up
or
down
on
submissions
Content
is
indexed
in
near-‐real
8me
Results
are
scored
by
a
combina8on
of
factors
(recency,
number
of
diggs,
etc.)
Lucid
Imagina8on,
Inc. 9
41. LinkedIn:
People
are
the
Content
LinkedIn
is
a
business
social
network
50
million
members
Faceted
search
facets
on
loca8on,
industries,
companies,
rela8onship,
etc.
not
all
are
easy
to
implement
Sor8ng
by
relevance
+
rela8onship
requires
significant
query-‐8me
work
Lucid
Imagina8on,
Inc. 10
42. Yelp:
People
and
Places
Yelp
facilitates
user
reviews
Searches
business
meta-‐data
plus
review
content
Heavy
geographic
component
Results
are
structured
by
establishment,
but
searchable
by
review
Lucid
Imagina8on,
Inc. 11
43. Xmarks:
PaMerns
from
the
People
Xmarks
provides
bookmark
sync
and
Web
discovery
First
provided
bookmark
sync;
adopted
by
millions
of
users
Aggregates
bookmark
folder
structure
and
meta-‐data
by
URL
This
descrip8ve
content
is
mined
to
provide
a
searchable
index
Needed
new
ranking
algorithms
to
provide
good
relevance
and
filter
out
the
noise
Lucid
Imagina8on,
Inc. 12
44. MySpace:
Searching
it
all
MySpace
does
it
all:
Many
content
types
from
all
over
the
site
User
generated
content
+
user
interac8ons
Near
Real
Time
New
content
and
users
arriving
24x7
Both
end-‐user
and
administra8ve
func8ons
admin
func8ons
include
log
file
searching
automated
tasks
help
iden8fy
spam,
other
problems
Massive
scale:
billions
of
records,
petabytes
of
source
data
new
content
at
the
rate
of
1TB
every
week
Lucid
Imagina8on,
Inc. 13
45. Social
Media
is
Pushing
Search
In
New
Direc;ons
Searches
the
product
of
interac8on
among
users,
not
just
content
Aggregates
data
from
mul8ple
sources
at
search
8me
Operates
in
real-‐8me,
as
data
is
produced
Extends
the
tradi8onal
no8ons
of
relevance
Builds
analy8cs
on
top
of
search
and...
you
can
build
all
of
this
on
open
source
products!
Lucid
Imagina8on,
Inc. 14