Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Data
Through
Splunk

1

Ledion
Bi6ncka

(ledion@splunk.com)

Alex
Batsakis

(abatsakis@splunk.com)

Architects

Spelunking:

Splunking:

to
explore

underground
caves

to
explore
machine
data

Splunk

Make
machine
data
accessible,
usable

and
valuable
to
everyone.

What
Does
Machine
Data
Look
Like?

3

Sources

Twi2er

Care
IVR

Middleware

Error

Order
Processing

Machine
Data
Contains
Cri6cal
Insights

4

Customer
ID
Order
ID

Customer’s
Tweet

Time
Wai6ng
On
Hold

TwiMer
ID

Product
ID

Company’s
TwiMer
ID

Sources

Twi2er

Care
IVR

Middleware

Error

Order
Processing

Customer
ID
Order
ID

Customer
ID

Machine
Data
Contains
Cri6cal
Insights

5

Order
ID

Customer’s
Tweet

Time
Wai6ng
On
Hold

Product
ID

Company’s
TwiMer
ID

Sources

Twi2er

Care
IVR

Middleware

Error

Order
Processing

Order
ID

Customer
ID

TwiMer
ID

Customer
ID

Customer
ID

Web

Services

Search,
Inves6gate
and
Explore
Your
Data

6

Find
and
ﬁx
issues
and
incidents
drama6cally
faster
across
your
organiza6on

Energy

Manufacturing

Shipping
RFID
Web

Services
Developers

App
Support
Telecoms

Networking

Desktops

Servers
Security

Databases/

DWH

Storage

Messaging

Online

Shopping

Carts

Clickstream

GPS/Cellular

Social
Media

Search
and

Inves6gate

Proac6ve

Monitoring
and

Aler6ng

Opera6onal

Visibility

Real-‐6me

Business
Insight

Turning
Machine
Data
into
Opera6onal
Intelligence

7

Proac6ve

Reac6ve

Let’s
drill
down
….

8

Massive
Linear
Scalability
to
100s
of
TBs/Day

9

Auto
load-‐balanced
forwarding
to
as
many
Splunk
Indexers
as
you
need
to
index
TB/day

Oﬄoad
search
load
to
Splunk
Search
Heads

How
data
moves
thru
Splunk

10

Consider
this
chunk
of
data
from
a
log
ﬁle:

/var/log/secure.log

...

2013/07/01T14:30:24.234-‐0400
Brian
pretends
to
be
from
South
Africa

2013/07/01T14:31:24.234-‐0400
Sean
is
originally
Canadian

2013/07/01T14:30:50.234-‐0400
Brian
spends
his
time
in:

-‐
Kentucky
with
phone
number
345.567.3456

-‐
New
Jersey

2013/07/01T14:32:24.234-‐0400
Matty
has
lived
in
the
following
cities:

-‐
Tijuana:
345
Main
St.

-‐
Saskatchewan:
3
One
Lane

-‐
Colombia:
567
White
line
Dr.
Bogota

2013/07/01T14:33:24.234-‐0400
Cesar
prefers
Burbon
Manhattans
over
beer

2013/07/01T14:33:24.234-‐0400
Matty
loves
GiGi
Mellow
Burgers

2013/07/01T14:33:24.234-‐0400
Sean
is
not
the
only
one
to
not
like
them

...

11

Host
my_host

Index
my_index

_raw
2013/07/01T14:30:24.234-‐0400
Brian
pretends
to
be
from
South

Africa

2013/07/01T14:31:24.234-‐0400
Sean
is
originally
Canadian

2013/07/01T14:30:50.234-‐0400
Brian
spends
his
time
in:
...

UTF-‐8
Line
Broken

_conf
<key
here>

Pipeline
Data

Pipelines/Processors

Parsing

Queue

Agg

Queue

Typing

Queue

Index

Queue

uk8

header

aggregator

regex

replacement

annotator

tcp
out

syslog
out

indexer

Parsing

Pipeline

Merging

Pipeline

Typing

Pipeline

Index

Pipeline

linebreaker

TCP/UDP

pipeline

Tailing

FIFO
pipeline

FSChange

Exec
pipeline

Queue

pData
pData
pData
pData

Queue

Thread
Thread

Process

Process
Remove

Insert

ü  Queue
size
bounded
by
memory

ü  Variable
size
Pipeline
Data

Persistent
Queue

Splunk

Host

Internal
Queues
Full

pData
pData
Tcpout
Q
Input
Q

Persistent
Q
A
Full

Network

Much
Bigger
Queue

Network

Indexing

Parsing

Queue

Agg

Queue

Typing

Queue

Index

Queue

uk8

header

aggregator

regex

replacement

annotator

tcp
out

syslog
out

indexer

Parsing

Pipeline

Merging

Pipeline

Typing

Pipeline

Index

Pipeline

linebreaker

TCP/UDP

pipeline

Tailing

FIFO
pipeline

FSChange

Exec
pipeline

What’s
an
index

Collec6ve
term
used
to
describe
rawdata

and
associated
tsidx
&
metadata
ﬁles.

17

Inside
an
index

18

[09:31:39]
[1065]::
lbi6ncka@lbi6ncka:
/opt/splunk/var/lib/splunk/_internaldb/

$
ls
-‐l

total
0

drwx-‐-‐-‐-‐-‐-‐

2
lbi6ncka

admin

68
Feb

6
12:57
colddb

drwx-‐-‐-‐-‐-‐-‐

17
lbi6ncka

admin

578
Jul

1
09:31
db

drwx-‐-‐-‐-‐-‐-‐

13
lbi6ncka

admin

442
Jun
27
16:36
summary

drwx-‐-‐-‐-‐-‐-‐

2
lbi6ncka

admin

68
Aug
24

2012
thaweddb

Index
name

Bucket
loca6ons

Inside
hot
&
warm
path

19

[10:20:00]
[1074]::
lbi6ncka@lbi6ncka:
/opt/splunk/var/lib/splunk/_internaldb/db/

$
ll

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

1.3K
Jun
27
13:50
.bucketManifest

drwx-‐-‐-‐-‐-‐-‐

17
lbi6ncka

admin

578B
Jul

1
10:19
.

drwx-‐-‐x-‐-‐x

17
lbi6ncka

admin

578B
Jun
26
12:45
db_1372264972_1371998026_159


16
lbi6ncka

admin

544B
Jun
18
08:20
db_1371225002_1370897127_156


16
lbi6ncka

admin

544B
Jun
26
12:50
db_1371998025_1371214200_158


14
lbi6ncka

admin

476B
Jun
26
12:50
db_1372265194_1372264972_160


14
lbi6ncka

admin

476B
Jul

1
10:19
hot_v1_161

drwx-‐-‐-‐-‐-‐-‐

6
lbi6ncka

admin

204B
Nov
12

2012
..


2
lbi6ncka

admin

68B
Aug
24

2012
GlobalMetaData

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

10B
Aug
24

2012
Crea6onTime

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

0B
Dec
21

2012
.db_1356066789_1355865285_43.rbsen6nel

No6ce
hot
&
warm
buckets

Bucket
names:
db_<lt>_<et>_<id>

Inside
a
bucket

20

[10:31:32]
[1092]::
lbi6ncka@lbi6ncka:
/opt/splunk/var/lib/splunk/_internaldb/db/db_1371998025_1371214200_158/

$
ll

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

27M
Jun
21
16:49
1371847782-‐1371214200-‐1941140693112088843.tsidx

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

7.1M
Jun
26
12:43
1371998025-‐1371847783-‐907852835360656754.tsidx

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

2.5M
Jun
26
12:43
merged_lexicon.lex

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

459K
Jun
26
12:43
bloomﬁlter

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

1.3K
Jun
23
10:33
Sources.data

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

615B
Jun
23
10:33
SourceTypes.data

drwx-‐-‐-‐-‐-‐-‐

17
lbi6ncka

admin

578B
Jul

1
10:31
..


16
lbi6ncka

admin

544B
Jun
26
12:50
.

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

451B
Jun
23
10:31
Strings.data

drwx-‐-‐-‐-‐-‐-‐

4
lbi6ncka

admin

136B
Jun
26
12:42
rawdata

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

116B
Jun
23
10:33
Hosts.data

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

76B
Jun
23
10:33
splunk-‐autogen-‐params.dat

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

52B
Jun
26
12:50
bucket_info.csv

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

49B
Jun
26
12:43
op6mize.result

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

10B
Jun
26
12:43
.rawSize

-‐rw-‐-‐-‐-‐-‐-‐-‐

1
lbi6ncka

admin

8B
Jun
26
12:43
.sizeManifest4.1

Metadata
&
Bloomfilters

  *.data

–  metadata
about
sources,
sourcetypes
and
hosts
of
the
events
contained
in
each

bucket

Bloomfilters

–  Efficient
data
structure
that
authorita6vely
rules
out
buckets

ê  i.e.
tells
you
with
100%
certainty
that
a
querying
term
is
NOT
in
present
in
a
bucket

–  By
default
consulted
by
every
search

21

Rawdata
(not
raw
data)

  Collec6on
of
compressed
(gzipped)
blocks,
called
slices,

–  Concatenated
together
in
a
rawdata/journal.gz

–  Think
”cat
chunkA.gz
chunkB.gz
...chunkN.gz
>
journal.gz”).

  Slices
contain
the
actual
raw
events.

  Pool
of
concatenated
slices
allows
be
seeked
into

–  Loca6ons
oﬀsets
are
pointed
to
by
the
values
array
pointers
in
tsidx.

  Such
organiza6on
allows
us
to
zoom
in
to
the
right
slice

–  reduces
the
amount
of
decompression
6me
&
volume
compared
to
having
a

single,
massive
rawdata
ﬁle.

22

TSIDX

  Time
series
index
(Inverted
index
op6mized
for
6me)

  Lexicon:

–  Keywords
within
the
specified
6me
range

–  Pos6ngs
list
array

  Values
array:

–  Structure
that
contains
pos6ng
values,
seek
address,
_6me
etc.

–  Seek
address
points
to
offsets
in
rawdata

  Time
is
of
transcendent
importance
in
Splunk,

–  tsidx
filenames
expose
et
and
lt

–  Values
arrays
arranged
in
6me
order
as
well

23

Lexicon

24

2013/07/01T14:30:24.234-‐0400
Brian
pretends
to
be
from
South
Africa

2013/07/01T14:31:24.234-‐0400
Sean
is
originally
Canadian

2013/07/01T14:30:50.234-‐0400
Brian
spends
his
time
in:

-‐
Kentucky
with
phone
number
345.567.3456

-‐
New
Jersey

2013/07/01T14:32:24.234-‐0400
Matty
has
lived
in
the
following
cities:

-‐
Tijuana:
345
Main
St.

-‐
Saskatchewan:
3
One
Lane

-‐
Colombia:
567
White
line
Dr.
Bogota

2013/07/01T14:33:24.234-‐0400
Cesar
prefers
Burbon
Manhattans
over
beer

2013/07/01T14:33:24.234-‐0400
Matty
loves
GiGi
Mellow
Burgers

2013/07/01T14:33:24.234-‐0400
Sean
is
not
the
only
one
to
not
like
them

Term
Posbng
List

3
4

345
3,4

…
…

Africa
0

Brian
0,2

Bogota
4

…
…

MaMy
5,6

Tijuana
4

Values
Array

25

2013/07/01T14:30:24.234-‐0400
Brian
pretends
to
be
from
South
Africa

2013/07/01T14:31:24.234-‐0400
Sean
is
originally
Canadian

2013/07/01T14:30:50.234-‐0400
Brian
spends
his
time
in:

-‐
Kentucky
with
phone
number
345.567.3456

-‐
New
Jersey

2013/07/01T14:32:24.234-‐0400
Matty
has
lived
in
the
following
cities:

-‐
Tijuana:
345
Main
St.

-‐
Saskatchewan:
3
One
Lane

-‐
Colombia:
567
White
line
Dr.
Bogota

2013/07/01T14:33:24.234-‐0400
Cesar
prefers
Burbon
Manhattans
over
beer

2013/07/01T14:33:24.234-‐0400
Matty
loves
GiGi
Mellow
Burgers

2013/07/01T14:33:24.234-‐0400
Sean
is
not
the
only
one
to
not
like
them

Posbng
Seek
addr
_bme
host
…

0
130
1372689024
my_host
…

1
150
1372689084
my_host
…

2
190
1372689050
my_host
…

3
389
1372689050
my_host
…

4
589
1372689050
my_host
…

5
800
1372689050
my_host
…

6
1399
1372689050
my_host
…

…
…

…
…

*all
values
for
illustra6on
purposes.
Not
necessarily
accurate

Tsidx
merging

  Many
small
tsidx
files
due
to
data
streaming

  Searching
is
inefficient
when
going
against
many
tsidx

files

splunk-‐op6mize

–  Merging
of
small
tsidx
files
into
a
larger
ones

–  Consolida6on
of
lexicons
and
pos6ng
list

26

Puzng
it
together

27

IDX
1

IDX
2

IDX
3

Cold
Path

Thawed
Path

Rawdata

TSIDX

hot_v1_100

hot_v1_101

db_lt_et_80

db_lt_et_101

*.data

*.tsidx

rawdata

db_lt_et_70

apple

beer

LEXICON

POSTING

“apple
pie
and
ice
cream

is
delicious”

“an
apple
a
day
keeps

doctor
away”

150

100

et

et

lt

lt

it

it

apple
beer
coke

ice
java
…

Home
Path

Source/Sourcetype/Host
Metadata

1
source
:
:
/my/log

2
source:
:
/blah

cream

Bucket
Lifecycle

28

Events

[Too
Many
Warms]
[Hot
Bucket
is
Full]

[Out
of
Space
or
Bucket
is
Old]

[Explicit
User
Ac6on]

$
Thawed
Path

$
Home
Path
$
Cold
Path

[Cheaper
Storage]

$
Frozen
Path

or
Deleted

How
do
we
search?

  Consult
the
lexicon
and
combine
the
pos6ng
lists

–  brian
OR
tijuana
=>
(0,
2)
OR
(4)
=
(0,
2,
4)

  Use
values
array
to
get
seek
address,
_6me,
source
and

sourcetype
for
(0,
2,
4)

  Use
the
seek
addresses
to
read
rawdata
in
oﬀset
(130,
150,

190)

  Send
“results”
to
the
search

29

Search
Model
Example

sourcetype=syslog ERROR | top user | fields - percent
Fetch
events

from
disk,
apply

schema

Summarize
into

table
of
top
10

users

Remove
column

showing

percentage

Intermediate
results table
Intermediate
results table
Final results
table
Disk

What
can
we
do
with
events?

  It’s
not
just
search
…

  SPL
=
Search
Processing
Language

–  Inspired
by
*nix
pipes

–  Schema
on
read

–  130+
search
commands
for
slicing
thru
data

  Versa6le
visualiza6on
library

  Scheduling
and
aler6ng

  …

31

LOB
Owners/

Execu6ves

System

Administrator

Opera6ons

Teams

Security

Analysts

IT

Execu6ves

Applica6on

Developers

Auditors

Website/Business

Analysts

Customer

Support

32

IT
Opera6ons
Management
Web
Intelligence

Business
Analy6cs
Applica6on
Management

Security
and
Compliance

Take
it
for
a
spin
…

hMp://www.splunk.com/download/

-‐  Download

-‐  Try
Splunk
Cloud
–
AWS

WE’RE
HIRING
!!

(in
SF
&
valley)

Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Recommended

Recommended

More Related Content

Similar to Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Similar to Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015 (20)

Recently uploaded

Recently uploaded (20)

Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015