This document outlines several examples that are included out-of-the-box (OOTB) with Solr 6.2, including techproducts, schemaless, cloud, and dih examples. It describes the configuration and data for each example, and how they can be rebuilt if removed. It also discusses the basic_configs configuration option and differences between configured vs schemaless modes.
3. Who
am
I
• So)ware
developer
with
20+
years
of
experience
– Including
3
years
as
Senior
Tech
Support
(BEA
Weblogic)
• Solr
popularizer
• Published
book
author
on
Solr
Indexing
(for
Solr
4.3)
• Run
hLp://www.solr-‐start.com
resource
site
• Solr
commiLer
(since
August
2016)
• Past
and
present
Solr
focus
on
onboarding,
usability,
tooling,
informaSon
sharing
4. Example
catch-‐22
• Search
is
a
–
surprisingly
-‐
complex
experSse
• Solr
is
a
complex
product
– Wide
– Deep
– History-‐rich
• And
so
are
its
many
examples
5. Fasten
the
seatbelt
• Review
all
of
the
(Solr
6.2)
OOTB
examples
• Make
a
small
one
from
scratch
• Deconstruct
a
real
shipped
example
• Next
learning
acSon...
6. OOTB
Examples
–
how
many?
bin/solr
start
–e
-‐e
<example>
Name
of
the
example
to
run;
available
examples:
cloud:
SolrCloud
example
techproducts:
Comprehensive
example
illustraSng
many
of
Solr's
core
capabiliSes
dih:
Data
Import
Handler
schemaless:
Schema-‐less
example
7. techproducts
example
• Used
to
be
collec/on1
• solr.home:
example/techproducts/solr
– Can
restart
with
bin/solr
start
-‐s
example/techproducts/solr
– Actual
core
at
example/techproducts/solr/techproducts
8. techproducts
example
(cont.)
• Source
configuraSon
– server/solr/configset/sample_techproducts_config
– Not
actually
a
configset
(copy,
not
share)
• Can
be
rebuilt
rm
–rf
example/techproducts
• Has
data
(14
files
of
products,
money,
uc8
tests)
bin/post
-‐c
techproducts
example/exampledocs/*.xml
9. schemaless
example
• solr.home:
example/schemaless/solr
• Actual
core:
example/schemaless/solr/ge?ngstarted
• Source
configuraSon:
– server/solr/configset/data_driven_schema_configs
– Config
you
get
when
you
are
not
using
config:
bin/solr
create
-‐c
newcore
• No
data,
but
can
take
(nearly)
anything:
bin/post
-‐c
<name>
example/exampledocs/*.xml
10. schemaless
mode?
• “Let
us
guess
what
you
mean”
– Auto-‐guess
field
type
based
on
first
content
occurrence
– Create
explicit
field
definiSons
• booleans,
dates,
numbers,
strings
• Always
mulSvalued
(because:
who
knows?!?)
• Can
be
configured
(URP
chain
in
solrconfig.xml)
– Rewrites
managed-‐schema
(coments
begone!)
– Makes
search
work
with
<copyField
source="*"
dest="_text_"/>
11. techproducts
vs
schemaless
• Configured
techproducts
vs
auto-‐detecSng
schemaless
• Strings
"name":"Test
with
some
GB18030
encoded
characters",
"name":["Test
with
some
GB18030
encoded
characters"],
• Numbers
"price":0.0,
"price_c":"0.0,USD",
"price":[0.0],
• Booleans
"inStock":true,
"inStock":[true],
12. cloud
example
• Highly
configurable
(unless
using
–noprompt)
• solr.home:
example/cloud/nodeX/solr
• Source
configuraSon
is
a
choice
Please
choose
a
configuraSon
for
the
genngstarted
collecSon,
available
opSons
are:
basic_configs,
data_driven_schema_configs,
or
sample_techproducts_configs
[data_driven_schema_configs]
• Can
be
rebuilt:
bin/solr
stop
-‐all
rm
-‐rf
example/cloud
• Demonstrates
Config
API
(configoverlay.json)
13. dih
example(s)
• Data
import
handler
–
legacy,
but
sSll
kicking
• solr.home:
example/example-‐DIH/solr
• Has
5
(five!)
different
cores
– db
-‐
database
import
(example/example-‐DIH/hsqldb/ex.*)
– solr
-‐
import
from
another
Solr
core
(configured
for
db
core)
– mail
-‐
import
from
IMAP
(needs
some
configuraSon)
– /ka
-‐
import
rich-‐content
(example/exampledocs/solr-‐word.pdf)
– rss
-‐
external
XML
feed
(very
broken
right
now)
• Cannot
be
rebuilt
–
only
empSed
bin/post
-‐c
db
-‐type
'applica/on/json'
-‐d
'{delete:
{query:"*:*"}}'
14. What
about:
bin/solr
start?
• solr.home:
server/solr
• No
iniSal
collecSon/cores,
have
to
create
explicitly:
– With
script
(see
bin/solr
create_core
–h
for
details):
bin/solr
create
–c
<corename>
-‐d
<name
or
path>
– With
Core
Admin
UI
for
non-‐SolrCloud:
hRp://localhost:8983/solr/admin/cores?ac/on=CREATE&…
– With
CollecSon
API
for
SolrCloud:
hRp://localhost:8983/admin/collec'ons?ac/on=CREATE&…
15. basic_configs
configuraSon
• Available
for
cloud
example
and
explicit
creaSon
• Schemaless
mode
is
configured,
not
enabled
• “Minimal
Solr
configuraSon”
!?!
– managed-‐schema:
1005
lines
– solrconfig.xml:
1484
lines
16. files
example
• Specifically
tuned
for
file
indexing
– Augmented
schemaless
mode
with
language,
content-‐type
guessing
– Custom
/browse
end-‐point
– Source
configuraSon:
example/files/conf
– Setup
instrucSons:
example/files/README.txt
– Bring
your
own
data
17.
18. films
example
• Schemaless
(Based
on
data_driven_schema_configs)
– Uses
Schema
API
to
add
custom
fields
– Uses
schemaless
for
rest
of
fields
• Comes
with
its
own
data
(1100
film
records)
• Uses
velocity
(/browse),
Schema
API,
Request
Parameters
API
(params.json)
• Setup
instrucSons:
example/films/README.txt
19. That
was
a
good
news
• Many
examples
• Easy
to
get
one
running
• Some
come
with
data
• Some
you
can
throw
your
own
data
into
• Lots
of
comments
20. This
is
the
bad
news
Files
Types
Fields
Dynamic
Fields
managed-‐schema
size
solrconfig.
xml
size
basic
46
71
4
73
1005
1484
data_driven
46
71
4
73
1005
1482
techproducts
101
66
33
28
1149
1701
dih
db
62
62
31
28
1129
1490
dih
Ska
6
61
3
27
901
1466
files
69
73
9
73
517
1508
films
(data_driven+)
46
71
8
73
481
1482
21. Tip
–
genng
these
numbers
• XML
extracSon
with
XMLStarlet
(XLST
CLI)
– xml
sel
-‐t
-‐m
"//fieldType"
-‐v
@name
-‐n
managed-‐schema
– xml
sel
-‐t
-‐m
"//copyField"
-‐c
.
-‐n
managed-‐schema
|wc
-‐l
– xml
sel
-‐t
-‐m
"//*[@docValues]"
-‐v
"concat(local-‐name(),
'
',
@name,
'
docValues:',
@docValues)"
-‐n
managed-‐schema
– xml
sel
-‐t
-‐m
"//requestHandler"
-‐v
"@name"
-‐n
solrconfig.xml
22. Why
is
it
like
this?
• Many
examples
predate
Solr
Reference
Guide
• grep
for
opSons,
possibiliSes,
defaults
• Each
example
is
a
kitchen
sink
“Too
much
of
a
good
thing
is
also
a
bad
thing”
Source:
1980s
Soviet
joke
about
Virtual
Reality
24. Go
small
–
managed-‐schema(2)
…
<fieldType name="string" class="solr.StrField"/>
<fieldType name="text_basic" class="solr.TextField">
<analyzer>
<tokenizer class="solr.LowerCaseTokenizerFactory" />
</analyzer>
</fieldType>
</schema>
25. Go
small
–
solrconfig.xml
<config>
<luceneMatchVersion>6.2.0</luceneMatchVersion>
<requestHandler name="/select” class="solr.SearchHandler”>
<lst name="defaults">
<str name="df">text</str>
</lst>
</requestHandler>
</config>
26. Go
small
–
load
and
test
• bin/solr
create
-‐c
demo
-‐d
.../demo-‐config/
• bin/post
-‐c
demo
example/exampledocs/*.xml
• Test
it
works,
using
HTTPie
(HTTP
CLI)
27.
28. Go
small
-‐
review
• Minimal
example
could
be
very
minimal
• Some
things
will
not
work
– No
uniqueKey
–
no
way
to
update
documents,
no
SolrCloud
– No
_version_
–
no
SolrCloud
– Everything
is
mulSValued
–
no
sorSng
– copyField
*
=>
text,
no
meaningful
relevancy,
specialized
analyzer
chain
processing
29. DeconstrucSng
films
example
• bin/solr
create
–c
films
• curl
hLp://localhost:8983/solr/films/schema
...
(add
name,
ini/al_release_date)
• Index
1100
records
from
– (Solr)
XML,
– (generic)
JSON
(doc),
or
– CSV
format
• Search
for
batman
• Use
/browse
end-‐point
and
search
for
batman
• Enable
highlighSng
in
results
30.
31. IniSal
stats
for
films
core
Sizes
(line
counts)
managed-‐schema*
481
solrconfig.xml
1482
params.json
20
File
count
in
conf
.txt
41
.xml
3
.json
1
managed-‐schema
(xml)
1
*
already
has
no
comments
32. DeconstrucSng
–
just
straight
tags
• managed-‐schema
lost
comments
during
construcSon
• Let's
remove
comments
from
solrconfig.xml
• xml
ed
-‐L
-‐d
"//comment()"
solrconfig.xml
– Edit
in
place
– Delete
XPATH
34. DeconstrucSng
–
what
to
clean
• Currently
– (explicit)
fields:
8
– dynamic
fields:
73
• xml
sel
-‐t
-‐m
"//dynamicField"
-‐v
@name
-‐n
managed-‐
schema
|wc
-‐l
– types:
71
– copyFields:
1
• Let's
start
from
dynamic
fields
35. DeconstrucSng
–
dynamic
fields
• Used
dynamic
fields
– do
NOT
modify
schema
– DO
show
up
in
Admin
UI,
if
used
– Example
from
different
schema:
• Used/matched
fields
• Generic
definiSons
37. DeconstrucSng
–
in
use
dynamic
fields
• NO
dynamic
fields
are
used
– *
is
a
copyField
instrucSon
• Can
remove
them
all
• xml
ed
-‐L
-‐d
"//dynamicField"
managed-‐schema
39. DeconstrucSng
–
field
types
• How
many
types
out
of
71
do
we
use?
– xml
sel
-‐t
-‐m
"//field|//dynamicField"
-‐v
"@type"
-‐n
conf/managed-‐schema
|sort
–u
– long,
string,
strings,
tdate,
text_general
• But
also
some
in
solrconfig.xml
– booleans,
string,
strings,
tdates,
tdoubles,
text_general,
tlongs
• Combined
total:
9
field
type
definiSons
• Delete
the
rest
(by
hand)
41. DeconstrucSng
–
support
files
• Inside
lang
directory
(38
files)
– find
lang
–name
'stopwords_*.txt'
|
wc
-‐l
• stopwords_*.txt:
30
files
• contracSons_*.txt:
4
files
– find
lang
-‐type
f
|egrep
-‐v
'stopwords_|contrac/ons_'
• hyphenaSons_ga.txt,
stemdict_nl.txt,
stoptags_ja.txt,
userdict_ja.txt
42. Support
files
–
sSll
in
use?
• Check
for
usage
– grep
-‐o
'stopwords_.*.txt'
managed-‐schema
solrconfig.xml
– grep
-‐o
'contrac/ons_.*.txt'
...
– ...
• NO
Matches
(we
no
longer
have
related
types)
– Delete
the
whole
lang
directory
• What
about
files
just
inside
config
directory
– Don't
need
currency.xml,
protwords.txt
46. The
mystery
of
_root_
• In
the
original
schema
–
no
explanaSons
• DocumentaSon
–
used
for
nested
documents:
To
support
nested
documents,
the
schema
must
include
an
indexed/non-‐stored
field
_root_
.
The
value
of
that
field
is
populated
automa/cally
and
is
the
same
for
all
documents
in
the
block,
regardless
of
the
inheritance
depth.
• We
are
not
using
nested
documents
• And
neither
does
any
other
shipped
example...
49. text_general
support
files
stopwords.txt
#
Licensed
to
the
Apache
Sokware
Founda/on
(ASF)
under
one
or
more
#
contributor
license
agreements.
See
the
NOTICE
file
distributed
with
#
this
work
for
addi/onal
informa/on
regarding
copyright
ownership.
#
The
ASF
licenses
this
file
to
You
under
the
Apache
License,
Version
2.0
#
(the
"License");
you
may
not
use
this
file
except
in
compliance
with
#
the
License.
You
may
obtain
a
copy
of
the
License
at
#
#
hRp://www.apache.org/licenses/LICENSE-‐2.0#
#
Unless
required
by
applicable
law
or
agreed
to
in
wri/ng,
sokware
#
distributed
under
the
License
is
distributed
on
an
"AS
IS"
BASIS,
#
WITHOUT
WARRANTIES
OR
CONDITIONS
OF
ANY
KIND,
either
express
or
implied.
#
See
the
License
for
the
specific
language
governing
permissions
and
#
limita/ons
under
the
License.
• synonyms.txt
#
The
ASF
licenses
this
file
to
You
under
the
Apache
License,
Version
2.0
#
(the
"License");
you
may
not
use
this
file
except
in
compliance
with#
the
License.
You
may
obtain
a
copy
of
the
License
at#.
......
.#-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
#some
test
synonym
mappings
unlikely
to
appear
in
real
input
textaaafoo
=>
aaabar
bbbfoo
=>
bbbfoo
bbbbar
cccfoo
=>
cccbar
cccbaz
fooaaa,baraaa,bazaaa
#
Some
synonym
groups
specific
to
this
example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television,
Televisions,
TV,
TVs
#no/ce
we
use
"gib"
instead
of
"GiB"
so
any
WordDelimiterFilter
coming
#aker
us
won't
split
it
into
two
words.
#
Synonym
mappings
can
be
used
for
spelling
correc/on
toopixima
=>
pixma
50. text_general's
empty
stopwords
• No
file
=>
default
stopwords
=>
English
• Empty
file
=>
disabled
stopwords
• Currently
–
NOT
used
53. How
far
did
we
get
Sizes
(line
counts)
managed-‐schema*
481
26
solrconfig.xml
1482
278
params.json
20
File
count
in
conf
.txt
41
0
.xml
3
2
.json
1
managed-‐schema
(xml)
1
*
already
has
no
comments
54. DeconstrucSng
–
solrconfig.xml
• solrconfig.xml
is
more
complex
than
schema
• Heterogeneous
SecSons
• Nested
definiSons
• AlternaSve
implementaSons
(e.g.
highlighter)
• Also
remember
– configoverlay.json
–
overrides
solrconfig.xml
– params.json
–
addiSonal
configuraSon
parameters
58. add-‐unknown-‐fields-‐to-‐the-‐schema
• Famous
"schemaless"
mode
• Generic,
but
fully
configurable
• Far
from
perfect
– Remember,
we
had
to
manually
pre-‐add
fields
– Development,
not
producSon
– Has
normalizaSon
side-‐effects
(normalizes
dates)
• Cannot
remove
it
in
our
example
60. highlighter
–
the
truth
• Highlighter
searchComponent
is
in
default
stack
• The
params
are
a
mix
of
standard
highlighter,
alternaSve
FastVector
highlighter
• Cannot
use
FastVector
version
as
schema
fields
are
missing
termVectors,
etc
• And
standard
highlighter
params
are
same
as
implicit
values
• Therefore,
we
can
remove
the
WHOLE
definiSon
64. solrconfig.xml
–
more
stuff
• There
is
more
that
can
be
taken
out
– query
secSon,
since
you
have
to
tune
it
anyway
– updateHandler,
and
revert
to
basic
commits
– jmx
– enableRemoteStreaming
–
definitely
take
that
out
• But
keep
velocity,
browse,
search
support
65. Next
acSon
• Join
the
(virtual)
Solr
Example
Reading
Group
– Starts
November
2016
– Register
at
hLp://bit.ly/SolrERG
• Join
mailing
list
at
hLp://www.solr-‐start.com
– Get
the
link
to
the
presentaSon
source
– Learn
about
other
similar
projects
– Get
news
of
Solr
arScles
and
projects
on
the
web