SlideShare a Scribd company logo
1 of 33
Download to read offline
Data	
  Through	
  Splunk	
  
1	
  
Ledion	
  Bi6ncka	
  	
  	
  	
  (ledion@splunk.com)	
  
Alex	
  Batsakis	
  	
  	
  (abatsakis@splunk.com)	
  
	
  
Architects	
  	
  
Spelunking:	
  
	
  
Splunking:	
  
to	
  explore	
  
underground	
  caves	
  
to	
  explore	
  machine	
  data	
  	
  
Splunk	
  
Make	
  machine	
  data	
  accessible,	
  usable	
  	
  
and	
  valuable	
  to	
  everyone.	
  	
  
What	
  Does	
  Machine	
  Data	
  Look	
  Like?	
  
3	
  
Sources	
  
Twi2er	
  
Care	
  IVR	
  
Middleware	
  	
  
Error	
  
Order	
  Processing	
  
Machine	
  Data	
  Contains	
  Cri6cal	
  Insights	
  
4	
  
Customer	
  ID	
   Order	
  ID	
  
Customer’s	
  Tweet	
  	
  
Time	
  Wai6ng	
  On	
  Hold	
  
TwiMer	
  ID	
  
Product	
  ID	
  
Company’s	
  TwiMer	
  ID	
  
Sources	
  
Twi2er	
  
Care	
  IVR	
  
Middleware	
  	
  
Error	
  
Order	
  Processing	
  
Customer	
  ID	
  Order	
  ID	
  
Customer	
  ID	
  
Machine	
  Data	
  Contains	
  Cri6cal	
  Insights	
  
5	
  
Order	
  ID	
  
Customer’s	
  Tweet	
  	
  
Time	
  Wai6ng	
  On	
  Hold	
  
Product	
  ID	
  
Company’s	
  TwiMer	
  ID	
  
Sources	
  
Twi2er	
  
Care	
  IVR	
  
Middleware	
  	
  
Error	
  
Order	
  Processing	
  
Order	
  ID	
  
Customer	
  ID	
  
TwiMer	
  ID	
  
Customer	
  ID	
  
Customer	
  ID	
  
Web	
  
Services	
  
Search,	
  Inves6gate	
  and	
  Explore	
  Your	
  Data	
  
6	
  
Find	
  and	
  fix	
  issues	
  and	
  incidents	
  drama6cally	
  faster	
  across	
  your	
  organiza6on	
  
Energy	
  
Manufacturing	
  
Shipping	
   RFID	
   Web	
  
Services	
  Developers	
  
App	
  Support	
  Telecoms	
  
Networking	
  
Desktops	
  
Servers	
   Security	
  
Databases/	
  
DWH	
  
Storage	
  
Messaging	
  
Online	
  
Shopping	
  
Carts	
  
Clickstream	
  
GPS/Cellular	
  
Social	
  Media	
  
Search	
  and	
  
Inves6gate	
  
Proac6ve	
  
Monitoring	
  and	
  
Aler6ng	
  
Opera6onal	
  
Visibility	
  
Real-­‐6me	
  	
  
Business	
  Insight	
  
Turning	
  Machine	
  Data	
  into	
  Opera6onal	
  Intelligence	
  
7	
  
Proac6ve	
  
Reac6ve	
  
Let’s	
  drill	
  down	
  ….	
  
8	
  
Massive	
  Linear	
  Scalability	
  to	
  100s	
  of	
  TBs/Day	
  
9	
  
Auto	
  load-­‐balanced	
  forwarding	
  to	
  as	
  many	
  Splunk	
  Indexers	
  as	
  you	
  need	
  to	
  index	
  TB/day	
  
Offload	
  search	
  load	
  to	
  Splunk	
  Search	
  Heads	
  	
  
How	
  data	
  moves	
  thru	
  Splunk	
  
10	
  
Consider	
  this	
  chunk	
  of	
  data	
  from	
  a	
  log	
  file:	
  
/var/log/secure.log	
  
...	
  
2013/07/01T14:30:24.234-­‐0400	
  Brian	
  pretends	
  to	
  be	
  from	
  South	
  Africa	
  
2013/07/01T14:31:24.234-­‐0400	
  Sean	
  is	
  originally	
  Canadian	
  
2013/07/01T14:30:50.234-­‐0400	
  Brian	
  spends	
  his	
  time	
  in:	
  	
  
	
  -­‐	
  Kentucky	
  with	
  phone	
  number	
  345.567.3456	
  
	
  -­‐	
  New	
  Jersey	
  	
  
2013/07/01T14:32:24.234-­‐0400	
  Matty	
  has	
  lived	
  in	
  the	
  following	
  cities:	
  
	
  -­‐	
  Tijuana:	
  345	
  Main	
  St.	
  	
  	
  
	
  -­‐	
  Saskatchewan:	
  3	
  One	
  Lane	
  
	
  -­‐	
  Colombia:	
  567	
  White	
  line	
  Dr.	
  Bogota	
  
2013/07/01T14:33:24.234-­‐0400	
  Cesar	
  prefers	
  Burbon	
  Manhattans	
  over	
  beer	
  
2013/07/01T14:33:24.234-­‐0400	
  Matty	
  loves	
  GiGi	
  Mellow	
  Burgers	
  
2013/07/01T14:33:24.234-­‐0400	
  Sean	
  is	
  not	
  the	
  only	
  one	
  to	
  not	
  like	
  them	
  
...	
  
11	
  
Host	
   my_host	
  
Index	
   my_index	
  
_raw	
   2013/07/01T14:30:24.234-­‐0400	
  Brian	
  pretends	
  to	
  be	
  from	
  South	
  
Africa	
  
2013/07/01T14:31:24.234-­‐0400	
  Sean	
  is	
  originally	
  Canadian	
  
2013/07/01T14:30:50.234-­‐0400	
  Brian	
  spends	
  his	
  time	
  in:	
  ...	
  
UTF-­‐8	
   Line	
  Broken	
   	
  	
  	
   	
  	
  	
  
_conf	
   <key	
  here>	
  
Pipeline	
  Data	
  
	
  
Pipelines/Processors	
  
Parsing	
  
Queue	
  
Agg	
  
Queue	
  
Typing	
  
Queue	
  
Index	
  
Queue	
  
uk8	
  
header	
  
aggregator	
  
regex	
  
replacement	
  
annotator	
  
tcp	
  out	
  
syslog	
  out	
  
indexer	
  
Parsing	
  
Pipeline	
  
Merging	
  
Pipeline	
  
Typing	
  
Pipeline	
  
Index	
  
Pipeline	
  
linebreaker	
  
TCP/UDP	
  
pipeline	
  
Tailing	
  
FIFO	
  pipeline	
  
FSChange	
  
Exec	
  pipeline	
  
Queue	
  
pData	
   pData	
   pData	
   pData	
  
Queue	
  
Thread	
  Thread	
  
Process	
  
Process	
  Remove	
  
Insert	
  
ü  Queue	
  size	
  bounded	
  by	
  memory	
  	
  
ü  Variable	
  size	
  Pipeline	
  Data	
  
Persistent	
  Queue	
  
Splunk	
  
Host	
  
Internal	
  Queues	
  Full	
  
pData	
   pData	
   Tcpout	
  Q	
  Input	
  Q	
  
Persistent	
  Q	
   A	
  Full	
  
Network	
  
Much	
  Bigger	
  Queue	
  	
  
Network	
  
Indexing	
  
Parsing	
  
Queue	
  
Agg	
  
Queue	
  
Typing	
  
Queue	
  
Index	
  
Queue	
  
uk8	
  
header	
  
aggregator	
  
regex	
  
replacement	
  
annotator	
  
tcp	
  out	
  
syslog	
  out	
  
indexer	
  
Parsing	
  
Pipeline	
  
Merging	
  
Pipeline	
  
Typing	
  
Pipeline	
  
Index	
  
Pipeline	
  
linebreaker	
  
TCP/UDP	
  
pipeline	
  
Tailing	
  
FIFO	
  pipeline	
  
FSChange	
  
Exec	
  pipeline	
  
What’s	
  an	
  index	
  
Collec6ve	
  term	
  used	
  to	
  describe	
  rawdata	
  
and	
  associated	
  tsidx	
  &	
  metadata	
  files.	
  
17	
  
Inside	
  an	
  index	
  
18	
  
[09:31:39]	
  [1065]::	
  lbi6ncka@lbi6ncka:	
  /opt/splunk/var/lib/splunk/_internaldb/	
  	
  
	
  $	
  ls	
  -­‐l	
  
total	
  0	
  
drwx-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  2	
  lbi6ncka	
  	
  admin	
  	
  	
  68	
  Feb	
  	
  6	
  12:57	
  colddb	
  
drwx-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  17	
  lbi6ncka	
  	
  admin	
  	
  578	
  Jul	
  	
  1	
  09:31	
  db	
  
drwx-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  13	
  lbi6ncka	
  	
  admin	
  	
  442	
  Jun	
  27	
  16:36	
  summary	
  
drwx-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  2	
  lbi6ncka	
  	
  admin	
  	
  	
  68	
  Aug	
  24	
  	
  2012	
  thaweddb	
  
Index	
  name	
  
Bucket	
  loca6ons	
  
Inside	
  hot	
  &	
  warm	
  path	
  
19	
  
[10:20:00]	
  [1074]::	
  lbi6ncka@lbi6ncka:	
  /opt/splunk/var/lib/splunk/_internaldb/db/	
  	
  
	
  $	
  ll	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  1.3K	
  Jun	
  27	
  13:50	
  .bucketManifest	
  
drwx-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  17	
  lbi6ncka	
  	
  admin	
  	
  	
  578B	
  Jul	
  	
  1	
  10:19	
  .	
  
drwx-­‐-­‐x-­‐-­‐x	
  	
  17	
  lbi6ncka	
  	
  admin	
  	
  	
  578B	
  Jun	
  26	
  12:45	
  db_1372264972_1371998026_159	
  
drwx-­‐-­‐x-­‐-­‐x	
  	
  16	
  lbi6ncka	
  	
  admin	
  	
  	
  544B	
  Jun	
  18	
  08:20	
  db_1371225002_1370897127_156	
  
drwx-­‐-­‐x-­‐-­‐x	
  	
  16	
  lbi6ncka	
  	
  admin	
  	
  	
  544B	
  Jun	
  26	
  12:50	
  db_1371998025_1371214200_158	
  
drwx-­‐-­‐x-­‐-­‐x	
  	
  14	
  lbi6ncka	
  	
  admin	
  	
  	
  476B	
  Jun	
  26	
  12:50	
  db_1372265194_1372264972_160	
  
drwx-­‐-­‐x-­‐-­‐x	
  	
  14	
  lbi6ncka	
  	
  admin	
  	
  	
  476B	
  Jul	
  	
  1	
  10:19	
  hot_v1_161	
  
drwx-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  6	
  lbi6ncka	
  	
  admin	
  	
  	
  204B	
  Nov	
  12	
  	
  2012	
  ..	
  
drwx-­‐-­‐x-­‐-­‐x	
  	
  	
  2	
  lbi6ncka	
  	
  admin	
  	
  	
  	
  68B	
  Aug	
  24	
  	
  2012	
  GlobalMetaData	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  	
  10B	
  Aug	
  24	
  	
  2012	
  Crea6onTime	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  	
  	
  0B	
  Dec	
  21	
  	
  2012	
  .db_1356066789_1355865285_43.rbsen6nel	
  
No6ce	
  hot	
  &	
  warm	
  buckets	
  	
  
Bucket	
  names:	
  db_<lt>_<et>_<id>	
  
Inside	
  a	
  bucket	
  
20	
  
[10:31:32]	
  [1092]::	
  lbi6ncka@lbi6ncka:	
  /opt/splunk/var/lib/splunk/_internaldb/db/db_1371998025_1371214200_158/	
  	
  
	
  $	
  ll	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  	
  27M	
  Jun	
  21	
  16:49	
  1371847782-­‐1371214200-­‐1941140693112088843.tsidx	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  7.1M	
  Jun	
  26	
  12:43	
  1371998025-­‐1371847783-­‐907852835360656754.tsidx	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  2.5M	
  Jun	
  26	
  12:43	
  merged_lexicon.lex	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  459K	
  Jun	
  26	
  12:43	
  bloomfilter	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  1.3K	
  Jun	
  23	
  10:33	
  Sources.data	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  615B	
  Jun	
  23	
  10:33	
  SourceTypes.data	
  
drwx-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  17	
  lbi6ncka	
  	
  admin	
  	
  	
  578B	
  Jul	
  	
  1	
  10:31	
  ..	
  
drwx-­‐-­‐x-­‐-­‐x	
  	
  16	
  lbi6ncka	
  	
  admin	
  	
  	
  544B	
  Jun	
  26	
  12:50	
  .	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  451B	
  Jun	
  23	
  10:31	
  Strings.data	
  
drwx-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  4	
  lbi6ncka	
  	
  admin	
  	
  	
  136B	
  Jun	
  26	
  12:42	
  rawdata	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  116B	
  Jun	
  23	
  10:33	
  Hosts.data	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  	
  76B	
  Jun	
  23	
  10:33	
  splunk-­‐autogen-­‐params.dat	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  	
  52B	
  Jun	
  26	
  12:50	
  bucket_info.csv	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  	
  49B	
  Jun	
  26	
  12:43	
  op6mize.result	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  	
  10B	
  Jun	
  26	
  12:43	
  .rawSize	
  
-­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  	
  	
  1	
  lbi6ncka	
  	
  admin	
  	
  	
  	
  	
  8B	
  Jun	
  26	
  12:43	
  .sizeManifest4.1	
  
Metadata	
  &	
  Bloomfilters	
  
  *.data	
  	
  
–  metadata	
  about	
  sources,	
  sourcetypes	
  and	
  hosts	
  of	
  the	
  events	
  contained	
  in	
  each	
  
bucket	
  
Bloomfilters	
  
–  Efficient	
  data	
  structure	
  that	
  authorita6vely	
  rules	
  out	
  buckets	
  
ê  i.e.	
  tells	
  you	
  with	
  100%	
  certainty	
  that	
  a	
  querying	
  term	
  is	
  NOT	
  in	
  present	
  in	
  a	
  bucket	
  
–  By	
  default	
  consulted	
  by	
  every	
  search	
  
21	
  
Rawdata	
  (not	
  raw	
  data)	
  
  Collec6on	
  of	
  compressed	
  (gzipped)	
  blocks,	
  called	
  slices,	
  	
  
–  Concatenated	
  together	
  in	
  a	
  rawdata/journal.gz	
  
–  Think	
  ”cat	
  chunkA.gz	
  chunkB.gz	
  ...chunkN.gz	
  >	
  journal.gz”).	
  	
  
  Slices	
  contain	
  the	
  actual	
  raw	
  events.	
  	
  
  Pool	
  of	
  concatenated	
  slices	
  allows	
  be	
  seeked	
  into	
  	
  
–  Loca6ons	
  offsets	
  are	
  pointed	
  to	
  by	
  the	
  values	
  array	
  pointers	
  in	
  tsidx.	
  	
  
  Such	
  organiza6on	
  allows	
  us	
  to	
  zoom	
  in	
  to	
  the	
  right	
  slice	
  	
  
–  reduces	
  the	
  amount	
  of	
  decompression	
  6me	
  &	
  volume	
  compared	
  to	
  having	
  a	
  
single,	
  massive	
  rawdata	
  file.	
  	
  	
  	
  
22	
  
TSIDX	
  
  Time	
  series	
  index	
  (Inverted	
  index	
  op6mized	
  for	
  6me)	
  
  Lexicon:	
  
–  Keywords	
  within	
  the	
  specified	
  6me	
  range	
  
–  Pos6ngs	
  list	
  array	
  	
  
  Values	
  array:	
  
–  Structure	
  that	
  contains	
  pos6ng	
  values,	
  seek	
  address,	
  _6me	
  etc.	
  	
  
–  Seek	
  address	
  points	
  to	
  offsets	
  in	
  rawdata	
  
  Time	
  is	
  of	
  transcendent	
  importance	
  in	
  Splunk,	
  	
  
–  tsidx	
  filenames	
  expose	
  et	
  and	
  lt	
  	
  
–  Values	
  arrays	
  arranged	
  in	
  6me	
  order	
  as	
  well	
  	
  
23	
  
Lexicon	
  
24	
  
	
  
2013/07/01T14:30:24.234-­‐0400	
  Brian	
  pretends	
  to	
  be	
  from	
  South	
  Africa	
  
2013/07/01T14:31:24.234-­‐0400	
  Sean	
  is	
  originally	
  Canadian	
  
2013/07/01T14:30:50.234-­‐0400	
  Brian	
  spends	
  his	
  time	
  in:	
  	
  
	
  -­‐	
  Kentucky	
  with	
  phone	
  number	
  345.567.3456	
  
	
  -­‐	
  New	
  Jersey	
  	
  
2013/07/01T14:32:24.234-­‐0400	
  Matty	
  has	
  lived	
  in	
  the	
  following	
  cities:	
  
	
  -­‐	
  Tijuana:	
  345	
  Main	
  St.	
  	
  	
  
	
  -­‐	
  Saskatchewan:	
  3	
  One	
  Lane	
  
	
  -­‐	
  Colombia:	
  567	
  White	
  line	
  Dr.	
  Bogota	
  
2013/07/01T14:33:24.234-­‐0400	
  Cesar	
  prefers	
  Burbon	
  Manhattans	
  over	
  beer	
  
2013/07/01T14:33:24.234-­‐0400	
  Matty	
  loves	
  GiGi	
  Mellow	
  Burgers	
  
2013/07/01T14:33:24.234-­‐0400	
  Sean	
  is	
  not	
  the	
  only	
  one	
  to	
  not	
  like	
  them	
  
Term	
   Posbng	
  List	
  
3	
   4	
  
345	
   3,4	
  
…	
   …	
  
Africa	
   0	
  
Brian	
   0,2	
  
Bogota	
   4	
  
…	
   …	
  
MaMy	
   5,6	
  
Tijuana	
   4	
  
Values	
  Array	
  
25	
  
	
  
2013/07/01T14:30:24.234-­‐0400	
  Brian	
  pretends	
  to	
  be	
  from	
  South	
  Africa	
  
	
  
	
  
2013/07/01T14:31:24.234-­‐0400	
  Sean	
  is	
  originally	
  Canadian	
  
	
  
	
  
2013/07/01T14:30:50.234-­‐0400	
  Brian	
  spends	
  his	
  time	
  in:	
  	
  
	
  -­‐	
  Kentucky	
  with	
  phone	
  number	
  345.567.3456	
  
	
  -­‐	
  New	
  Jersey	
  	
  
	
  
2013/07/01T14:32:24.234-­‐0400	
  Matty	
  has	
  lived	
  in	
  the	
  following	
  cities:	
  
	
  -­‐	
  Tijuana:	
  345	
  Main	
  St.	
  	
  	
  
	
  -­‐	
  Saskatchewan:	
  3	
  One	
  Lane	
  
	
  -­‐	
  Colombia:	
  567	
  White	
  line	
  Dr.	
  Bogota	
  
2013/07/01T14:33:24.234-­‐0400	
  Cesar	
  prefers	
  Burbon	
  Manhattans	
  over	
  beer	
  
	
  
2013/07/01T14:33:24.234-­‐0400	
  Matty	
  loves	
  GiGi	
  Mellow	
  Burgers	
  
	
  
	
  
2013/07/01T14:33:24.234-­‐0400	
  Sean	
  is	
  not	
  the	
  only	
  one	
  to	
  not	
  like	
  them	
  
Posbng	
   Seek	
  addr	
   _bme	
   host	
   …	
  
0	
   130	
   1372689024	
   my_host	
   …	
  
1	
   150	
   1372689084	
   my_host	
   …	
  
2	
   190	
   1372689050	
   my_host	
   …	
  
3	
   389	
   1372689050	
   my_host	
   …	
  
4	
   589	
   1372689050	
   my_host	
   …	
  
5	
   800	
   1372689050	
   my_host	
   …	
  
6	
   1399	
   1372689050	
   my_host	
   …	
  
…	
   …	
  
…	
   …	
  
*all	
  values	
  for	
  illustra6on	
  purposes.	
  Not	
  necessarily	
  accurate	
  
Tsidx	
  merging	
  	
  
  Many	
  small	
  tsidx	
  files	
  due	
  to	
  data	
  streaming	
  
  Searching	
  is	
  inefficient	
  when	
  going	
  against	
  many	
  tsidx	
  
files	
  	
  
splunk-­‐op6mize	
  
–  Merging	
  of	
  small	
  tsidx	
  files	
  into	
  a	
  larger	
  ones	
  
–  Consolida6on	
  of	
  lexicons	
  and	
  pos6ng	
  list	
  
	
  
26	
  
Puzng	
  it	
  together	
  
27	
  
IDX	
  1	
  
IDX	
  2	
  
IDX	
  3	
  
Cold	
  Path	
  
Thawed	
  Path	
  
Rawdata	
  
TSIDX	
  
hot_v1_100	
  
hot_v1_101	
  
db_lt_et_80	
  
db_lt_et_101	
  
*.data	
  
*.tsidx	
  
rawdata	
  
db_lt_et_70	
  
apple	
  
beer	
  
LEXICON	
  
POSTING	
  
“apple	
  pie	
  and	
  ice	
  cream	
  
is	
  delicious”	
  
“an	
  apple	
  a	
  day	
  keeps	
  
doctor	
  away”	
  
150	
  
100	
  
et	
  
et	
  
lt	
  
lt	
  
it	
  
it	
  
apple	
   beer	
   coke	
  
ice	
   java	
   …	
  
Home	
  Path	
  
Source/Sourcetype/Host	
  Metadata	
  
1	
  source	
  :	
  :	
  /my/log	
  
2	
  source:	
  :	
  /blah	
  
cream	
  
Bucket	
  Lifecycle	
  
28	
  
Events	
  
[Too	
  Many	
  Warms]	
  [Hot	
  Bucket	
  is	
  Full]	
  
[Out	
  of	
  Space	
  or	
  Bucket	
  is	
  Old]	
  
[Explicit	
  User	
  Ac6on]	
  
$	
  Thawed	
  Path	
  
$	
  Home	
  Path	
   $	
  Cold	
  Path	
  
[Cheaper	
  Storage]	
  
$	
  Frozen	
  Path	
  
or	
  Deleted	
  
How	
  do	
  we	
  search?	
  
  Consult	
  the	
  lexicon	
  and	
  combine	
  the	
  pos6ng	
  lists	
  	
  
–  brian	
  OR	
  tijuana	
  =>	
  (0,	
  2)	
  OR	
  (4)	
  =	
  (0,	
  2,	
  4)	
  
  Use	
  values	
  array	
  to	
  get	
  seek	
  address,	
  _6me,	
  source	
  and	
  
sourcetype	
  for	
  (0,	
  2,	
  4)	
  
  Use	
  the	
  seek	
  addresses	
  to	
  read	
  rawdata	
  in	
  offset	
  (130,	
  150,	
  
190)	
  
  Send	
  “results”	
  to	
  the	
  search	
  	
  
29	
  
Search	
  Model	
  Example	
  
sourcetype=syslog ERROR | top user | fields - percent
Fetch	
  events	
  
from	
  disk,	
  apply	
  
schema	
  
Summarize	
  into	
  
table	
  of	
  top	
  10	
  
users	
  
Remove	
  column	
  
showing	
  
percentage	
  
Intermediate
results table
Intermediate
results table
Final results
table
Disk	
  
What	
  can	
  we	
  do	
  with	
  events?	
  
  It’s	
  not	
  just	
  search	
  …	
  
  SPL	
  =	
  Search	
  Processing	
  Language	
  	
  
–  Inspired	
  by	
  *nix	
  pipes	
  	
  
–  Schema	
  on	
  read	
  	
  
–  130+	
  search	
  commands	
  for	
  slicing	
  thru	
  data	
  
  Versa6le	
  visualiza6on	
  library	
  	
  
  Scheduling	
  and	
  aler6ng	
  	
  
  …	
  
31	
  
LOB	
  Owners/	
  
Execu6ves	
  
System	
  
Administrator	
  
Opera6ons	
  
Teams	
  
Security	
  
Analysts	
  
IT	
  	
  
Execu6ves	
  
Applica6on	
  
Developers	
  
Auditors	
  
Website/Business	
  
Analysts	
  
Customer	
  
Support	
  
32	
  
IT	
  Opera6ons	
  Management	
   Web	
  Intelligence	
  
Business	
  Analy6cs	
  Applica6on	
  Management	
  
Security	
  and	
  Compliance	
  
Take	
  it	
  for	
  a	
  spin	
  …	
  	
  
hMp://www.splunk.com/download/	
  
	
  
-­‐  Download	
  	
  
-­‐  Try	
  Splunk	
  Cloud	
  –	
  AWS	
  	
  
WE’RE	
  HIRING	
  !!	
  
(in	
  SF	
  &	
  valley)	
  	
  

More Related Content

Similar to Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsDell World
 
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason JonesASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jonesarborjjones
 
Cabrinety-NIST Project: AMIA DAS 2015
Cabrinety-NIST Project: AMIA DAS 2015Cabrinety-NIST Project: AMIA DAS 2015
Cabrinety-NIST Project: AMIA DAS 2015charthai
 
Mining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMaris Elsins
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC OsloDavid Pilato
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code EuropeDavid Pilato
 
Chaining 7 vulnerabilities in Citrix ShareFile On-Premise
Chaining 7 vulnerabilities in Citrix ShareFile On-PremiseChaining 7 vulnerabilities in Citrix ShareFile On-Premise
Chaining 7 vulnerabilities in Citrix ShareFile On-PremiseJohanna Curiel
 
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014Eva Tse
 
Analysis of Compromised Linux Server
Analysis of Compromised Linux ServerAnalysis of Compromised Linux Server
Analysis of Compromised Linux Serveranandvaidya
 
Linux-Permission
Linux-PermissionLinux-Permission
Linux-PermissionColin Su
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoringIben Rodriguez
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
13.11.02 inside playground(抄)
13.11.02 inside playground(抄)13.11.02 inside playground(抄)
13.11.02 inside playground(抄)Kei Nakazawa
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and VisualizationSurasak Sanguanpong
 
M|18 Scalability via Expendable Resources: Containers at BlaBlaCar
M|18 Scalability via Expendable Resources: Containers at BlaBlaCarM|18 Scalability via Expendable Resources: Containers at BlaBlaCar
M|18 Scalability via Expendable Resources: Containers at BlaBlaCarMariaDB plc
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward
 
Application-engaged Dynamic Orchestration of Optical Network Resources
Application-engaged Dynamic Orchestration of Optical Network ResourcesApplication-engaged Dynamic Orchestration of Optical Network Resources
Application-engaged Dynamic Orchestration of Optical Network ResourcesTal Lavian Ph.D.
 
Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1seungdon Choi
 

Similar to Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015 (20)

What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database Trends
 
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason JonesASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
 
SSL Securing Oracle DB
SSL Securing Oracle DBSSL Securing Oracle DB
SSL Securing Oracle DB
 
Cabrinety-NIST Project: AMIA DAS 2015
Cabrinety-NIST Project: AMIA DAS 2015Cabrinety-NIST Project: AMIA DAS 2015
Cabrinety-NIST Project: AMIA DAS 2015
 
Mining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMining AWR V2 - Trend Analysis
Mining AWR V2 - Trend Analysis
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code Europe
 
Chaining 7 vulnerabilities in Citrix ShareFile On-Premise
Chaining 7 vulnerabilities in Citrix ShareFile On-PremiseChaining 7 vulnerabilities in Citrix ShareFile On-Premise
Chaining 7 vulnerabilities in Citrix ShareFile On-Premise
 
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
 
Analysis of Compromised Linux Server
Analysis of Compromised Linux ServerAnalysis of Compromised Linux Server
Analysis of Compromised Linux Server
 
Linux-Permission
Linux-PermissionLinux-Permission
Linux-Permission
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
13.11.02 inside playground(抄)
13.11.02 inside playground(抄)13.11.02 inside playground(抄)
13.11.02 inside playground(抄)
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and Visualization
 
M|18 Scalability via Expendable Resources: Containers at BlaBlaCar
M|18 Scalability via Expendable Resources: Containers at BlaBlaCarM|18 Scalability via Expendable Resources: Containers at BlaBlaCar
M|18 Scalability via Expendable Resources: Containers at BlaBlaCar
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 
Application-engaged Dynamic Orchestration of Optical Network Resources
Application-engaged Dynamic Orchestration of Optical Network ResourcesApplication-engaged Dynamic Orchestration of Optical Network Resources
Application-engaged Dynamic Orchestration of Optical Network Resources
 
Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

  • 1. Data  Through  Splunk   1   Ledion  Bi6ncka        (ledion@splunk.com)   Alex  Batsakis      (abatsakis@splunk.com)     Architects    
  • 2. Spelunking:     Splunking:   to  explore   underground  caves   to  explore  machine  data     Splunk   Make  machine  data  accessible,  usable     and  valuable  to  everyone.    
  • 3. What  Does  Machine  Data  Look  Like?   3   Sources   Twi2er   Care  IVR   Middleware     Error   Order  Processing  
  • 4. Machine  Data  Contains  Cri6cal  Insights   4   Customer  ID   Order  ID   Customer’s  Tweet     Time  Wai6ng  On  Hold   TwiMer  ID   Product  ID   Company’s  TwiMer  ID   Sources   Twi2er   Care  IVR   Middleware     Error   Order  Processing   Customer  ID  Order  ID   Customer  ID  
  • 5. Machine  Data  Contains  Cri6cal  Insights   5   Order  ID   Customer’s  Tweet     Time  Wai6ng  On  Hold   Product  ID   Company’s  TwiMer  ID   Sources   Twi2er   Care  IVR   Middleware     Error   Order  Processing   Order  ID   Customer  ID   TwiMer  ID   Customer  ID   Customer  ID  
  • 6. Web   Services   Search,  Inves6gate  and  Explore  Your  Data   6   Find  and  fix  issues  and  incidents  drama6cally  faster  across  your  organiza6on   Energy   Manufacturing   Shipping   RFID   Web   Services  Developers   App  Support  Telecoms   Networking   Desktops   Servers   Security   Databases/   DWH   Storage   Messaging   Online   Shopping   Carts   Clickstream   GPS/Cellular   Social  Media  
  • 7. Search  and   Inves6gate   Proac6ve   Monitoring  and   Aler6ng   Opera6onal   Visibility   Real-­‐6me     Business  Insight   Turning  Machine  Data  into  Opera6onal  Intelligence   7   Proac6ve   Reac6ve  
  • 8. Let’s  drill  down  ….   8  
  • 9. Massive  Linear  Scalability  to  100s  of  TBs/Day   9   Auto  load-­‐balanced  forwarding  to  as  many  Splunk  Indexers  as  you  need  to  index  TB/day   Offload  search  load  to  Splunk  Search  Heads    
  • 10. How  data  moves  thru  Splunk   10  
  • 11. Consider  this  chunk  of  data  from  a  log  file:   /var/log/secure.log   ...   2013/07/01T14:30:24.234-­‐0400  Brian  pretends  to  be  from  South  Africa   2013/07/01T14:31:24.234-­‐0400  Sean  is  originally  Canadian   2013/07/01T14:30:50.234-­‐0400  Brian  spends  his  time  in:      -­‐  Kentucky  with  phone  number  345.567.3456    -­‐  New  Jersey     2013/07/01T14:32:24.234-­‐0400  Matty  has  lived  in  the  following  cities:    -­‐  Tijuana:  345  Main  St.        -­‐  Saskatchewan:  3  One  Lane    -­‐  Colombia:  567  White  line  Dr.  Bogota   2013/07/01T14:33:24.234-­‐0400  Cesar  prefers  Burbon  Manhattans  over  beer   2013/07/01T14:33:24.234-­‐0400  Matty  loves  GiGi  Mellow  Burgers   2013/07/01T14:33:24.234-­‐0400  Sean  is  not  the  only  one  to  not  like  them   ...   11  
  • 12. Host   my_host   Index   my_index   _raw   2013/07/01T14:30:24.234-­‐0400  Brian  pretends  to  be  from  South   Africa   2013/07/01T14:31:24.234-­‐0400  Sean  is  originally  Canadian   2013/07/01T14:30:50.234-­‐0400  Brian  spends  his  time  in:  ...   UTF-­‐8   Line  Broken               _conf   <key  here>   Pipeline  Data    
  • 13. Pipelines/Processors   Parsing   Queue   Agg   Queue   Typing   Queue   Index   Queue   uk8   header   aggregator   regex   replacement   annotator   tcp  out   syslog  out   indexer   Parsing   Pipeline   Merging   Pipeline   Typing   Pipeline   Index   Pipeline   linebreaker   TCP/UDP   pipeline   Tailing   FIFO  pipeline   FSChange   Exec  pipeline  
  • 14. Queue   pData   pData   pData   pData   Queue   Thread  Thread   Process   Process  Remove   Insert   ü  Queue  size  bounded  by  memory     ü  Variable  size  Pipeline  Data  
  • 15. Persistent  Queue   Splunk   Host   Internal  Queues  Full   pData   pData   Tcpout  Q  Input  Q   Persistent  Q   A  Full   Network   Much  Bigger  Queue     Network  
  • 16. Indexing   Parsing   Queue   Agg   Queue   Typing   Queue   Index   Queue   uk8   header   aggregator   regex   replacement   annotator   tcp  out   syslog  out   indexer   Parsing   Pipeline   Merging   Pipeline   Typing   Pipeline   Index   Pipeline   linebreaker   TCP/UDP   pipeline   Tailing   FIFO  pipeline   FSChange   Exec  pipeline  
  • 17. What’s  an  index   Collec6ve  term  used  to  describe  rawdata   and  associated  tsidx  &  metadata  files.   17  
  • 18. Inside  an  index   18   [09:31:39]  [1065]::  lbi6ncka@lbi6ncka:  /opt/splunk/var/lib/splunk/_internaldb/      $  ls  -­‐l   total  0   drwx-­‐-­‐-­‐-­‐-­‐-­‐      2  lbi6ncka    admin      68  Feb    6  12:57  colddb   drwx-­‐-­‐-­‐-­‐-­‐-­‐    17  lbi6ncka    admin    578  Jul    1  09:31  db   drwx-­‐-­‐-­‐-­‐-­‐-­‐    13  lbi6ncka    admin    442  Jun  27  16:36  summary   drwx-­‐-­‐-­‐-­‐-­‐-­‐      2  lbi6ncka    admin      68  Aug  24    2012  thaweddb   Index  name   Bucket  loca6ons  
  • 19. Inside  hot  &  warm  path   19   [10:20:00]  [1074]::  lbi6ncka@lbi6ncka:  /opt/splunk/var/lib/splunk/_internaldb/db/      $  ll   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      1.3K  Jun  27  13:50  .bucketManifest   drwx-­‐-­‐-­‐-­‐-­‐-­‐    17  lbi6ncka    admin      578B  Jul    1  10:19  .   drwx-­‐-­‐x-­‐-­‐x    17  lbi6ncka    admin      578B  Jun  26  12:45  db_1372264972_1371998026_159   drwx-­‐-­‐x-­‐-­‐x    16  lbi6ncka    admin      544B  Jun  18  08:20  db_1371225002_1370897127_156   drwx-­‐-­‐x-­‐-­‐x    16  lbi6ncka    admin      544B  Jun  26  12:50  db_1371998025_1371214200_158   drwx-­‐-­‐x-­‐-­‐x    14  lbi6ncka    admin      476B  Jun  26  12:50  db_1372265194_1372264972_160   drwx-­‐-­‐x-­‐-­‐x    14  lbi6ncka    admin      476B  Jul    1  10:19  hot_v1_161   drwx-­‐-­‐-­‐-­‐-­‐-­‐      6  lbi6ncka    admin      204B  Nov  12    2012  ..   drwx-­‐-­‐x-­‐-­‐x      2  lbi6ncka    admin        68B  Aug  24    2012  GlobalMetaData   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        10B  Aug  24    2012  Crea6onTime   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin          0B  Dec  21    2012  .db_1356066789_1355865285_43.rbsen6nel   No6ce  hot  &  warm  buckets     Bucket  names:  db_<lt>_<et>_<id>  
  • 20. Inside  a  bucket   20   [10:31:32]  [1092]::  lbi6ncka@lbi6ncka:  /opt/splunk/var/lib/splunk/_internaldb/db/db_1371998025_1371214200_158/      $  ll   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        27M  Jun  21  16:49  1371847782-­‐1371214200-­‐1941140693112088843.tsidx   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      7.1M  Jun  26  12:43  1371998025-­‐1371847783-­‐907852835360656754.tsidx   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      2.5M  Jun  26  12:43  merged_lexicon.lex   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      459K  Jun  26  12:43  bloomfilter   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      1.3K  Jun  23  10:33  Sources.data   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      615B  Jun  23  10:33  SourceTypes.data   drwx-­‐-­‐-­‐-­‐-­‐-­‐    17  lbi6ncka    admin      578B  Jul    1  10:31  ..   drwx-­‐-­‐x-­‐-­‐x    16  lbi6ncka    admin      544B  Jun  26  12:50  .   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      451B  Jun  23  10:31  Strings.data   drwx-­‐-­‐-­‐-­‐-­‐-­‐      4  lbi6ncka    admin      136B  Jun  26  12:42  rawdata   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      116B  Jun  23  10:33  Hosts.data   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        76B  Jun  23  10:33  splunk-­‐autogen-­‐params.dat   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        52B  Jun  26  12:50  bucket_info.csv   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        49B  Jun  26  12:43  op6mize.result   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        10B  Jun  26  12:43  .rawSize   -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin          8B  Jun  26  12:43  .sizeManifest4.1  
  • 21. Metadata  &  Bloomfilters     *.data     –  metadata  about  sources,  sourcetypes  and  hosts  of  the  events  contained  in  each   bucket   Bloomfilters   –  Efficient  data  structure  that  authorita6vely  rules  out  buckets   ê  i.e.  tells  you  with  100%  certainty  that  a  querying  term  is  NOT  in  present  in  a  bucket   –  By  default  consulted  by  every  search   21  
  • 22. Rawdata  (not  raw  data)     Collec6on  of  compressed  (gzipped)  blocks,  called  slices,     –  Concatenated  together  in  a  rawdata/journal.gz   –  Think  ”cat  chunkA.gz  chunkB.gz  ...chunkN.gz  >  journal.gz”).       Slices  contain  the  actual  raw  events.       Pool  of  concatenated  slices  allows  be  seeked  into     –  Loca6ons  offsets  are  pointed  to  by  the  values  array  pointers  in  tsidx.       Such  organiza6on  allows  us  to  zoom  in  to  the  right  slice     –  reduces  the  amount  of  decompression  6me  &  volume  compared  to  having  a   single,  massive  rawdata  file.         22  
  • 23. TSIDX     Time  series  index  (Inverted  index  op6mized  for  6me)     Lexicon:   –  Keywords  within  the  specified  6me  range   –  Pos6ngs  list  array       Values  array:   –  Structure  that  contains  pos6ng  values,  seek  address,  _6me  etc.     –  Seek  address  points  to  offsets  in  rawdata     Time  is  of  transcendent  importance  in  Splunk,     –  tsidx  filenames  expose  et  and  lt     –  Values  arrays  arranged  in  6me  order  as  well     23  
  • 24. Lexicon   24     2013/07/01T14:30:24.234-­‐0400  Brian  pretends  to  be  from  South  Africa   2013/07/01T14:31:24.234-­‐0400  Sean  is  originally  Canadian   2013/07/01T14:30:50.234-­‐0400  Brian  spends  his  time  in:      -­‐  Kentucky  with  phone  number  345.567.3456    -­‐  New  Jersey     2013/07/01T14:32:24.234-­‐0400  Matty  has  lived  in  the  following  cities:    -­‐  Tijuana:  345  Main  St.        -­‐  Saskatchewan:  3  One  Lane    -­‐  Colombia:  567  White  line  Dr.  Bogota   2013/07/01T14:33:24.234-­‐0400  Cesar  prefers  Burbon  Manhattans  over  beer   2013/07/01T14:33:24.234-­‐0400  Matty  loves  GiGi  Mellow  Burgers   2013/07/01T14:33:24.234-­‐0400  Sean  is  not  the  only  one  to  not  like  them   Term   Posbng  List   3   4   345   3,4   …   …   Africa   0   Brian   0,2   Bogota   4   …   …   MaMy   5,6   Tijuana   4  
  • 25. Values  Array   25     2013/07/01T14:30:24.234-­‐0400  Brian  pretends  to  be  from  South  Africa       2013/07/01T14:31:24.234-­‐0400  Sean  is  originally  Canadian       2013/07/01T14:30:50.234-­‐0400  Brian  spends  his  time  in:      -­‐  Kentucky  with  phone  number  345.567.3456    -­‐  New  Jersey       2013/07/01T14:32:24.234-­‐0400  Matty  has  lived  in  the  following  cities:    -­‐  Tijuana:  345  Main  St.        -­‐  Saskatchewan:  3  One  Lane    -­‐  Colombia:  567  White  line  Dr.  Bogota   2013/07/01T14:33:24.234-­‐0400  Cesar  prefers  Burbon  Manhattans  over  beer     2013/07/01T14:33:24.234-­‐0400  Matty  loves  GiGi  Mellow  Burgers       2013/07/01T14:33:24.234-­‐0400  Sean  is  not  the  only  one  to  not  like  them   Posbng   Seek  addr   _bme   host   …   0   130   1372689024   my_host   …   1   150   1372689084   my_host   …   2   190   1372689050   my_host   …   3   389   1372689050   my_host   …   4   589   1372689050   my_host   …   5   800   1372689050   my_host   …   6   1399   1372689050   my_host   …   …   …   …   …   *all  values  for  illustra6on  purposes.  Not  necessarily  accurate  
  • 26. Tsidx  merging       Many  small  tsidx  files  due  to  data  streaming     Searching  is  inefficient  when  going  against  many  tsidx   files     splunk-­‐op6mize   –  Merging  of  small  tsidx  files  into  a  larger  ones   –  Consolida6on  of  lexicons  and  pos6ng  list     26  
  • 27. Puzng  it  together   27   IDX  1   IDX  2   IDX  3   Cold  Path   Thawed  Path   Rawdata   TSIDX   hot_v1_100   hot_v1_101   db_lt_et_80   db_lt_et_101   *.data   *.tsidx   rawdata   db_lt_et_70   apple   beer   LEXICON   POSTING   “apple  pie  and  ice  cream   is  delicious”   “an  apple  a  day  keeps   doctor  away”   150   100   et   et   lt   lt   it   it   apple   beer   coke   ice   java   …   Home  Path   Source/Sourcetype/Host  Metadata   1  source  :  :  /my/log   2  source:  :  /blah   cream  
  • 28. Bucket  Lifecycle   28   Events   [Too  Many  Warms]  [Hot  Bucket  is  Full]   [Out  of  Space  or  Bucket  is  Old]   [Explicit  User  Ac6on]   $  Thawed  Path   $  Home  Path   $  Cold  Path   [Cheaper  Storage]   $  Frozen  Path   or  Deleted  
  • 29. How  do  we  search?     Consult  the  lexicon  and  combine  the  pos6ng  lists     –  brian  OR  tijuana  =>  (0,  2)  OR  (4)  =  (0,  2,  4)     Use  values  array  to  get  seek  address,  _6me,  source  and   sourcetype  for  (0,  2,  4)     Use  the  seek  addresses  to  read  rawdata  in  offset  (130,  150,   190)     Send  “results”  to  the  search     29  
  • 30. Search  Model  Example   sourcetype=syslog ERROR | top user | fields - percent Fetch  events   from  disk,  apply   schema   Summarize  into   table  of  top  10   users   Remove  column   showing   percentage   Intermediate results table Intermediate results table Final results table Disk  
  • 31. What  can  we  do  with  events?     It’s  not  just  search  …     SPL  =  Search  Processing  Language     –  Inspired  by  *nix  pipes     –  Schema  on  read     –  130+  search  commands  for  slicing  thru  data     Versa6le  visualiza6on  library       Scheduling  and  aler6ng       …   31  
  • 32. LOB  Owners/   Execu6ves   System   Administrator   Opera6ons   Teams   Security   Analysts   IT     Execu6ves   Applica6on   Developers   Auditors   Website/Business   Analysts   Customer   Support   32   IT  Opera6ons  Management   Web  Intelligence   Business  Analy6cs  Applica6on  Management   Security  and  Compliance  
  • 33. Take  it  for  a  spin  …     hMp://www.splunk.com/download/     -­‐  Download     -­‐  Try  Splunk  Cloud  –  AWS     WE’RE  HIRING  !!   (in  SF  &  valley)