The Seven Deadly Sins of Solr - By Jay Hill

Introductions…!
  Who
the
hell
am
I?

 Jay
Hill,
Lucid
Imagina-on

 7
years
Lucene
experience

 4
years
Solr
experience

 Author
of
Lucid
Training

 SME
for
Lucid
Cer-ﬁca-on

  Who
the
hell
are
you?

 New
to
search?

 New
to
Lucene/Solr?

 BaKle-‐tested
veterans?

©
Lucid
Imagina-on,
Inc.

We'll Leave Time For Q&A!
  Who's
doing
what?

 Solr
3.1?

 Solr
1.4.1?

 Nightly
build?

 Solr
1.3
or
older?

  Are
there
any
speciﬁc
problems
you're
having?

  Meanwhile,
interrupt,
ask
ques8ons
as
we
go,
etc.

©
Lucid
Imagina-on,
Inc.

A Brief Word About Lucid Imagination!
  Lucid
Imagina8on:

 The
commercial
company
suppor-ng

Lucene/Solr
open
source
search.

 Founded
by

 Yonik
Seeley
–
Creator
of
Solr

 Erik
Hatcher
–
Co-‐author,
Lucene
In
Ac-on

 Grant
Ingersoll
–
Apache
PMC
Chair

 Marc
Krellenstein
–
Lucid
CTO

 Staﬀ
includes
9
Lucene/Solr
commiKers

 Training,
cer-ﬁca-on,
support,
LucidWorks
Enterprise

©
Lucid
Imagina-on,
Inc.

Lucid Customers (That I've Worked With)!

©
Lucid
Imagina-on,
Inc.

…On To The Sinning!!

©
Lucid
Imagina-on,
Inc.

Sins As Anti-Patterns?!
  "Sorta
kinda"

 Specify
Nothing
(Sloth)

 Creeping
Featurei-s
(Greed)

 Blowhard
Jamboree
(Pride)

 Boat
Anchor
(Lust)

 Not
Invented
Here
(Envy)

 Phatware
(GluKony)

 Emperor's
New
Clothes
(Wrath)

©
Lucid
Imagina-on,
Inc.

Sins Can Contradict One Another!!
  You'll
no-ce
that
many
of
the
"sins"

we
see
will
be
the
exact
opposite
of

others

  Just
as
some
of
us
tend
towards

laziness,
others
towards
excess

  Some-mes
you
-‐

 "Look
before
you
leap."

  Other
-mes,

 "He
who
hesitates
is
lost."

  In
Solr
(or
any
search
app),
one
size
never
ﬁts
all

©
Lucid
Imagina-on,
Inc.

"I
don't
know

and
I
don't
care."

©
Lucid
Imagina-on,
Inc.

Sloth!
  "We
aren't
really
into
open
source."

 Lack
of
commitment
to
Solr
and/or
the
search

applica-on
itself

  Not
developing
in-‐house
Solr
exper-se

  Not
paying
enough
aKen-on
to
JVM
sebngs,

garbage
collec-on,
and
RAM
alloca-on.

©
Lucid
Imagina-on,
Inc.

Sloth!
  Neglec-ng
to
get
familiar
with
the
source
code

 It
is
open
source
ader
all!

  Not
taking
the
-me
to
understand
the
main

parts
of
Solr:

 Request
Handlers

 Search
components

 Query
parsers

 Extend
QParserPlugin
class

 ValueSource
&
ValueSourceParser
–
custom
func-ons

 New
pseudo-‐ﬁelds
in
4.x

 Response
writers

©
Lucid
Imagina-on,
Inc.

Sloth!
  Not
keeping
up
with
new
features
and

developments
in
Lucene
and
Solr

CHANGES.txt
–
use
"diﬀ"
to
keep
up
on
changes

©
Lucid
Imagina-on,
Inc.

Sloth!
  New
features
in
Solr
3.1:

 Solr
spa8al

 Edismax
query
parser

 NOT
experimental!

 Dynamic
metadata
extrac-on
via
UIMA

 Numeric
range
face8ng
(like
date
face-ng)

 Lucene
RAMDirectoryFactory
available

 Face-ng
performance
improvements

 Spellcheck
and
Terms
components
now

work
for
distributed
search

 Suggester
component
–
beKer
autosuggest!

 Can
add
custom
dict.,
phrases,
etc.

©
Lucid
Imagina-on,
Inc.

Sloth!
  New
features
coming
in
Solr
4.x:

 Lucene
DocumentWritersPerThread
(DWPT)

 Moving
towards
"real
-me"

 UpdateHandler
upgrade
to
work
with
real-‐-me

 Field
collapsing/grouping

 Pivot
facets

 SolrCloud
(Zookeeper)

 Fuzzy
queries
100
-mes
faster

 Pseudo
ﬁelds
via
func-ons

 Relevancy
func-on
queries:
n,
idf,
docFreq,
norm,
…

©
Lucid
Imagina-on,
Inc.

Sloth: The Path To Salvation!
  Commit
to
the
project
and
to
learning
Solr

  Stay
up
to
date
on
Solr
changes

  Stay
current
with
ongoing
releases

  Get
familiar
with
the
source
code

  Spend
some
-me
to
understand
the
main

configura-on
files:

 solrconfig.xml

 schema.xml

  Read
through
the
en-re
Solr
Wiki
once
every
so
oden

  Develop
in-‐house
Solr
exper-se

©
Lucid
Imagina-on,
Inc.

Save
a
penny,

lose
a
customer.

©
Lucid
Imagina-on,
Inc.

Greed!
  Skimping
on
resources
such
as:

 RAM

 "Here's
a
quarter
buddy,
go
buy
some
RAM!"

 Storage
space

  You
will
get
what
you
pay
for!

 …on
the
other
hand,
not
every
company
has
"deep
pockets"

©
Lucid
Imagina-on,
Inc.

Greed!
  Trying
to
"squeeze
by",
indexing
to,
and
searching

on,
the
same
server

Indexing

Indexing

Shards
(Indexers)

Slave/Searchers

Load
Balancer

Searches

Searches

©
Lucid
Imagina-on,
Inc.

Greed!
  Not
making
the
eﬀort
to
ﬁnd
the
right
balance

between
precision
and
recall

Recall:
What
frac-on
of
Precision:
What
frac-on

the
relevant
documents
in
of
the
returned
results

the
collec-on
were
re-‐
are
relevant
to
the

turned
by
the
system?

informa-on
need?

©
Lucid
Imagina-on,
Inc.

Greed!
  A
few
thoughts
about
relevance:

 Get
feedback
from
domain
experts

 Is
it
beKer
to
have
lots
of
results
with
less

precision,
or
fewer,
more
targeted
results?

 Diﬀerent
sites
will
have
very
diﬀerent

requirements

©
Lucid
Imagina-on,
Inc.

Greed: The Path To Salvation!
  Pry
open
your
wallet
–
don't
be
cheap

  You
don't
have
to
push
the
envelope

  Find
the
right
balance
between
recall
and
precision

  Don't
push
for
more
results
over
precision
–
unless

that
is
a
clear
requirement
(some-mes
it
is)

©
Lucid
Imagina-on,
Inc.

"What
could
possibly

go
wrong?

©
Lucid
Imagina-on,
Inc.

Pride!
  Reinven-ng
the
wheel

 "Why
don't
we
just
write
our
own
search

libraries?"

 Nobody
has
a
use
case
like
us
–
right?

 "We
need
to
change
the
scoring
algorithms."

©
Lucid
Imagina-on,
Inc.

Pride!
  Thinking
you
can
"do
it
all"
in
Solr

 Solr
is
rarely
a
good
choice
as
a
SOR

  Consider
other
tools
to
work
with
Solr:

 Nutch

 Mahout

 OpenNLP

 Google
Connector
Framework

 Your
own
code

©
Lucid
Imagina-on,
Inc.

Pride!
  Stubbornly
refusing
to
use
resources
such
as
the

mailing
lists:

 Solr
user
list:

 solr-‐user@lucene.apache.org

 Solr
developer
list:

 dev@lucene.apache.org

 Lucene
user
list:

 java-‐user@lucene.apache.org

  LucidFind:
hKp://www.lucidimagina-on.com/search/

©
Lucid
Imagina-on,
Inc.

Pride!
  "I
will
not
yield!"

 Trying
to
"win
baKles"
on
the
mailing
lists

 Good
Karma
–
be
a
good
ci-zen
in
the
community

©
Lucid
Imagina-on,
Inc.

Pride: The Path To Salvation!
  Ask
for
help
when
needed

  Let
the
business
needs
deﬁne
the
project
–
don't

let
the
tail
wag
the
dog

  Get
a
feel
for
the
Solr
community
and
respect
the

experience
of
others

  You're
situa-on,
while
possibly
unique,
is
probably

not
completely
dissimilar
to
others.
Learn
from
the

pioneers
and
Solr
veterans

©
Lucid
Imagina-on,
Inc.

"Someone
stop
me!"

©
Lucid
Imagina-on,
Inc.

Lust!
  Obsessing
over
unimportant
details
too
early

in
the
project

 Agile
approach
is
well
suited
to
Solr

development
–
iterate!

  Trying
to
"push
the
envelope"

 Necessary
some-mes,
but
it's
not
called

the
"bleeding
edge"
without
reason

 "Ease
in"
to
major
changes

  Too
much
aKen-on
to
JVM
sebngs

 Solr
experts
are
not
usually
JVM/GC
experts

©
Lucid
Imagina-on,
Inc.

Lust!
  "An--‐greed"
–
CommiEng
too
many
resources

to
Solr

 Make
sure
the
OS
has
plenty
of
RAM

to
cache
ﬁles,
etc

  "If
one
is
good,
a
dozen
must
be
beKer!"

 As
much
as
possible,
try
to
get
a
sense
of
what

your
query
volume
will
be,
and
don't
just
throw

money
at
building
a
monstrous
farm
of
searchers

 Solr
has
proven
to
be
much
more
eﬃcient
than
some

large,
commercial
search
solu-ons

©
Lucid
Imagina-on,
Inc.

Lust!
  Blood
from
a
turnip:

 Trying
some
absurd
new
technique,

"just
because"

  RAMDirectoryFactory
–
not
a
secret
way
to
faster

indexing/searching

 No
disk-‐backed
persistence

 Usually
not
worth
it

 …but
you
never
know…

  Research
ﬁrst
before
going
"extreme"

©
Lucid
Imagina-on,
Inc.

Lust!
  No
need
to
index
millions
of
docs
for
development

  BeKer
to
work
with
small
sets
of
data
while

gebng
started.

  Don't
worry
too
much
about
ﬁeld
types
as
you
get

started.
Get
data
in
the
index,
then
analyze
and

reﬁne.

©
Lucid
Imagina-on,
Inc.

Lust: The Path To Salvation!
  Use
an
agile
approach
–
start
simply,
build
your

applica-on
slowly,
iterate

  Deal
with
the
low-‐hanging
fruit
ﬁrst

  Measure
twice,
cut
once

  Don't
miss
the
forest
for
the
trees
–
no
need
to

obsess
over
details
in
the
early
stages

  Do
some
due
diligence
before
trying
unorthodox

approaches

  Get
a
small
sample
of
data
indexed
w/o
worrying
about
type,

then
itera-ons
of
reﬁnement

©
Lucid
Imagina-on,
Inc.

"If
we
had
some
bacon

we
could
have
some

bacon
and
eggs
–
if
we

had
some
eggs."

©
Lucid
Imagina-on,
Inc.

Envy!
  Adding
"cool"
features
you
see
on
other

sites,
but
don't
really
need

 Keep
it
"lean
and
mean",
especially

to
start

 Resist
the
urge
to
include
the

"kitchen
sink"

©
Lucid
Imagina-on,
Inc.

Envy!
  You
too
can
master
dismax!

 Don't
be
afraid
of
dismax/edismax

 Lots
of
controls
to
learn,
but
also

lots
of
power

 Flexibility
to
search
mul-ple
fields

 Boost
different
fields

 Boost
phrase
fields
(pf)
higher
than
query
fields
(qf)

 Use
boost
queries
(bq)
and
func-on
queries
(bf)

 Most
in-mida-ng
params:

 -e

 mm

©
Lucid
Imagina-on,
Inc.

Envy!
  Spa-al
search
–
seems
complicated,
but

major
sites
make
it
look
easy

  Now,
in
Solr
3.1
–
it
is
easy!

  You
can:

 Store
spa-al
data
in
your
index

 Filter
by
distance

 Sort
by
distance

 Boost/bias
by
distance

 Facet
by
distance

  Also
consider:
Search-‐based
naviga-on
such
as

"Show
me
in-‐stock
items
only"

©
Lucid
Imagina-on,
Inc.

Envy: The Path To Salvation!
  Focus
on
your
requirements,
don't
try

to
add
"bells
and
whistles"
you
don't

need

  Don't
be
hesitant
to
dive
into
the
power

of
dismax/edismax

  Take
advantage
of
new
features
such
as

Solr
spa-al,
if
those
features
will
add

value
to
the
end
user
experience

©
Lucid
Imagina-on,
Inc.

Gluttony!
  “Staying
fit
and
trim”
is
usually
good
prac-ce

when
designing
and
running
Solr
applica-ons

 Once
again
–
keep
it
"lean
and
mean"

  A
lot
of
these
issues
cross
over
into
the
“Sloth”

category

 The
effort
needed
to
keep
your
configura-on

and
data
efficiently
managed
is
not
considered

important

  Don't
lose
control
of
your
configura-on
files

 Remove
unnecessary
elements

 Version
control
all
configura-on
files

©
Lucid
Imagina-on,
Inc.

Gluttony!
  Slim
down
those
"bloated"
queries:

 q="red
shoes"&
accountId=(12343
OR
338899

OR
554443
OR
243445
OR
55442OR
3330899

OR
59927
OR
3888999
OR
549
OR
440293579

34201
OR
339917
OR
300191
OR
339338
OR

109823
OR
679176
OR
31407815
OR
3001756

OR
134322
OR
311123
OR
987888
OR
997181
OR
771819
OR

100292
OR
3389474
OR
5505759
OR
2459577
OR
4499957
OR

1996571
OR
559590
OR
220299
OR
4404872
OR
151510
OR

66017
OR
666
OR
113459
OR
890575
OR
505725
OR
330393
OR

349940
OR
4094994
OR
1245995
OR
2459959
OR
4255909
OR

899955
OR
7878899
OR
100999
…
∞
)

©
Lucid
Imagina-on,
Inc.

Gluttony!
  Stay
in
shape
–
Flex
Your
Solr
Muscles!

 Keep
up
on
new
features

 Training,
when
appropriate

 Cer-ﬁca-on

 Contribute!

 Follow
the
user
lists

 Refactor
when
new
features
can
help

 Keep
up
to
date
on
new
releases

©
Lucid
Imagina-on,
Inc.

Gluttony: The Path To Salvation!
  Keep
configura-on
files
clean
and
trim.
Remove

unused
elements

  Periodically
review
queries
to
make
sure
they

are
efficient

  Refactor
when
necessary
–
keep
your

applica-on
fit
and
trim

©
Lucid
Imagina-on,
Inc.

Wrath!
  Wrath
-‐
usually
synonymous
with
anger,
but…

  Let’s
use
an
older
deﬁni-on
here:

 “A
vehement
denial
of
the
truth,

both
to
others
and
in
the
form
of

self-‐denial
and
impaMence.”

  Step
back
every
now
and
then
and
look

objec-vely
at
your
applica-on

©
Lucid
Imagina-on,
Inc.

Wrath!
  Ignoring
new
Solr
releases

 OK
to
wait
un-l
a
release
is
proven

 But
gebng
too
far
behind
makes
upgrading

more
painful
with
each
release

  We
don't
have
-me
to
do
it
right,
but
we
always

have
-me
to
ﬁx
it

©
Lucid
Imagina-on,
Inc.

Wrath!
  Ignoring
complaints
about
results
relevance

  Disregarding
feedback
from
stakeholders

  Remember
–
the
point
of
your
search
applica-on

is
to
support
the
business,
not
to
"build
cool
stuff"

  Not
taking
advantage
of
log
files

 Consider
mining
log
files,
storing
data
in

rela-onal
DB
for
genera-ng
reports

 Capturing
user
queries
and
query
counts
can
be

extremely
useful

 Can
also
be
used
for
query-‐based
autosuggest.

(not
just
indexed
terms)

©
Lucid
Imagina-on,
Inc.

Wrath: The Path To Salvation!
  Keep
your
version
of
Solr
up
to
date

 OK
to
wait
"awhile",
but
don't
skip
versions

  Seek
and
embrace
feedback
from
business
and

domain
experts

  Constantly
gauge
and
improve
relevance
as
an

ongoing
task

  Avoid
the
push
to
release
too
soon
(as
best
you
can)

  Take
advantage
of
log
ﬁles
to
understand
what

users
are
doing,
and
what
is
not
working
well

©
Lucid
Imagina-on,
Inc.

¡Búsqueda,
y
usted
encontrará!

The Seven Deadly Sins of Solr - By Jay Hill

Recommended

Recommended

More Related Content

Similar to The Seven Deadly Sins of Solr - By Jay Hill

Similar to The Seven Deadly Sins of Solr - By Jay Hill (20)

More from lucenerevolution

More from lucenerevolution (20)

Recently uploaded

Recently uploaded (20)

The Seven Deadly Sins of Solr - By Jay Hill