This session covers how Search Head Clustering provides horizontal scalability to support more users and searches, and high availability to ensure users can access their searches at all times. We also cover the architecture, how it works, and best practices guides for large scale deployment.
2. Disclaimer
2
During
the
course
of
this
presentaLon,
we
may
make
forward-‐looking
statements
regarding
future
events
or
the
expected
performance
of
the
company.
We
cauLon
you
that
such
statements
reflect
our
current
expectaLons
and
esLmates
based
on
factors
currently
known
to
us
and
that
actual
events
or
results
could
differ
materially.
For
important
factors
that
may
cause
actual
results
to
differ
from
those
contained
in
our
forward-‐looking
statements,
please
review
our
filings
with
the
SEC.
The
forward-‐looking
statements
made
in
the
this
presentaLon
are
being
made
as
of
the
Lme
and
date
of
its
live
presentaLon.
If
reviewed
aEer
its
live
presentaLon,
this
presentaLon
may
not
contain
current
or
accurate
informaLon.
We
do
not
assume
any
obligaLon
to
update
any
forward-‐looking
statements
we
may
make.
In
addiLon,
any
informaLon
about
our
roadmap
outlines
our
general
product
direcLon
and
is
subject
to
change
at
any
Lme
without
noLce.
It
is
for
informaLonal
purposes
only,
and
shall
not
be
incorporated
into
any
contract
or
other
commitment.
Splunk
undertakes
no
obligaLon
either
to
develop
the
features
or
funcLonality
described
or
to
include
any
such
feature
or
funcLonality
in
a
future
release.
3. Agenda
! What
is
Search
Head
Clustering?
! Business
Benefits
of
Search
Head
Clustering
! SHC
ConfiguraLon
/
ReplicaLon
! App
Deployment
! Tips
and
Tricks
– MigraLon
! Q&A
3
4. Search
Head
Clustering
Ability
to
group
search
heads
into
a
cluster
in
order
to
provide
Highly
Available
and
Scalable
search
services
4
MISSION
CRITICAL
ENTERPRISE
5. 5
Horizontal
Scaling
Always-‐on
Search
Services
Consistent
User
Experience
Easy
to
add
/
manage
premium
contents
(apps)
Business
Benefits
of
SHC
6. SHP
vs.
SHC
SHC
SHP
• Available
since
v4.2
• Sharing
configuraLons
through
NFS
• Single
point
of
failure
• Performance
issues
• No
NFS
• ReplicaLon
using
local
storage
• Commodity
hardware
6
NFS
7.
1. No
Single
Point
of
Failures
2. “One
ConfiguraLon”
across
SH
3. Horizontal
Scaling
7
1. Dynamic
Captain
2. AutomaLc
Config
replicaLon
across
SH
3. Ability
to
add/remove
nodes
on
running
cluster
Design
Goals
ImplementaLon
8. SHC
–
How
Does
it
Work?
8
1. Group
search
heads
into
a
cluster
2. A
captain
gets
elected
dynamically
3. User
created
reports/dashboards
automaLcally
replicated
to
other
search
heads
1
2
3
14. Details
14
! Captain
updates
RA/DM
summaries
on
indexers.
! Scheduler
limits
honored
across
the
cluster
! Real
Lme
scheduled
searches
run
one
instance
across
cluster
! Auto-‐failover
–
New
captain
becomes
scheduler
! captain_is_adhoc_searchhead
knob
to
reduce
captain
load
15. Alerts
&
Suppression
15
! Alerts
fired
when
results
of
search
meet
alerLng
criteria
– Historical
Searches
–
within
10
seconds
aEer
job
completes
– RealTime
searches
–
ongoing
basis
! Captain
merges
and
maintains
global
view
of
alerts
! Suppression
informaLon
centralized
by
the
captain
! Merged
Alerts/Suppression
sent
back
to
members
19.
ArLfact
Proxy-‐ing
19
! ReplicaLon
Guarantees
HA&DR
but...
! SID
not
available
on
all
nodes
*locally*
! RealTime
searches
are
not
replicated
! We
use
proxy-‐ing
to
fill
these
gaps
! Proxying
on
REST
request
captain
over
HB
locaLon
log
r1
proxy
orig
AuthenCcaCon
is
cluster
aware!!
HB
=
Heartbeat
r2
async
replicate
20. Adhoc
Search
Management
20
! Adhoc
search
-‐
interacLve
search
run
from
a
user
session
! Adhoc
searches
not
replicated
! Captain,
however
will
have
global
knowledge
of
all
searches
! GET
services/search/jobs
will
return
the
global
list
of
searches
! You
can
proxy
and
access
adhoc
searches
from
any
node
21. Reaping
of
Search
ArLfacts
21
! Reaping
–
DeleLon
of
search
results
when
TTL
(Lme
to
live)
expires
! Search
ArLfacts
reaped
from
the
origin
node
! Captain
orchestrates
reaps
of
the
replicas
23. HA
&
Auto-‐Failover
23
Design
Goals
1. No
Single
Point
of
Failure
2. ConLnuous
UpLme
3. Consistent
User
Experience
ImplementaLon
1. Dynamic
Captain
elecLon
2. Auto
Failover
3. Proxying
for
consistent
view
24. Dynamic
Captain
24
! RaE
Consensus
Protocol
from
Stanford
– Diego
Ongaro
&
John
Osterhout
– Acknowledge
Diego
Ongaro
for
help!
! SHC
uses
RAFT
for
LE
and
Auto
Failover
RV
=
Request
Vote
LE
=
Leader
ElecLon
SHC
=
Search
Head
Clustering
S4
S2
S5
S3
S1
captain
new
captain
25. Auto-‐Failover
25
new
captain
...
members
old
captain
arLfacts
running
jobs
alerts,
etc
search
load
scheduler
Fixups
27. ConfiguraLon
Files
! Custom
user
content
– Reports
– Dashboards
! Search-‐Lme
knowledge
– Field
extracLons
– Event
types
– Macros
! System
configuraLons
– Inputs,
forwarding,
authenLcaLon
28. Goal
28
! Consistent
user
experience
across
all
search
heads
! Changes
made
on
one
member
are
reflected
on
all
members
29. ConfiguraLon
Changes
29
! Users
customize
search
and
UI
configuraLons
via
UI/CLI/REST
– Save
report
– Add
panel
to
dashboards
– Create
field
extracLon
! Administrators
modify
system
configuraLons
– Configure
forwarding
– Deploy
centralized
authenLcaLon
(e.g.
LDAP)
– Install
enLrely
new
app
or
hand-‐edited
configuraLon
30. Search
and
UI
ConfiguraLons
30
! Changes
to
search
and
UI
configuraLons
are
replicated
across
the
search
head
cluster
automaLcally
! Goal:
eventual
consistency
33. Custom
App
Content
33
! App
devs
may
"opt-‐in"
their
custom
configuraLons
for
replicaLon
under
search
head
clustering
! Example
server.conf
from
an
app
would
look
like:
[shclustering]
conf_replicaLon_include.my_custom_file
=
true
34. System
ConfiguraLons
34
! Recall:
only
changes
to
search
and
UI
configuraLons
are
replicated
across
the
search
head
cluster
automaLcally
! hanges
to
system
configuraLons
are
not
replicated
automaLcally
because
of
their
high
potenLal
impact
! How
are
system
configuraLons
kept
consistent
then?
35. ! Deployer:
a
single,
well-‐controlled
instance
outside
of
the
cluster
! ConfiguraLons
should
be
tested
on
dev/QA
instances
prior
to
deploy
D
ConfiguraLon
Deployment
35
38. Best
PracLces
! Deployer
Instance
– Can
piggyback
Cluster
Master
or
Deployment
Server
– RecommendaLon
is
to
run
Deployer
on
separate
instance
! Run
CLI
to
get
status
about
SHC
– ./splunk
show
shcluster-‐status
38