More Related Content Similar to Field Extractions: Making Regex Your Buddy (20) More from Michael Wilde (7) Field Extractions: Making Regex Your Buddy1. Making Reg[Ee]x Your Buddy
August
15,
2011
(?i)(mi(chael|ke)
wilde),
Splunk
Ninja
Thursday, August 18, 11
2. Hi,
I’m
Michael
Wilde
• You
may
know
me
from:
Splunk Worldwide Users’ Conference 2 © Copyright Splunk 2011
Thursday, August 18, 11
3. What
is
RegEx
“Finite
Automata”
•Regular
Expression
invented
in
the
1950’s
by
mathemaUcian
Stephen
Cole
Kleene
•Implemented
by
“ed”
and
“grep”
creator
Ken
Thompson
in
1973
Pa[ern
matching
language
for
text
processing
•Has
slightly
different
implementaUons
(PERL,
POSIX)
•Way
crypUc
at
first
sight
Splunk Worldwide Users’ Conference 3 © Copyright Splunk 2011
Thursday, August 18, 11
4. Why
should
you
care
•Field
extracUon
is
a
requirement
for
reporUng
•Index-‐Ume
filtering
&
rouUng
•You’ll
seem
smart
•It
will
be
useful
beyond
Splunk
•You
might
score
with
the
(ladies|dudes)
at
(MakersFaire
|ComiCon).
Splunk Worldwide Users’ Conference 4 © Copyright Splunk 2011
Thursday, August 18, 11
6. Thinking
Regex
•Log
Events
are
a
great
place
to
start,
they
have
structure
•Don’t
overthink
it.
The
pa[ern
is
there
waiUng
to
discovered
•Don’t
be
lazy
and
use
wildcards
too
much
•Learn
to
love
“NOT”
regexes.
S+
D+
W+
[^,]+
Splunk Worldwide Users’ Conference 6 © Copyright Splunk 2011
Thursday, August 18, 11
8. Be
nice
to
your
RegEx
engine
• MS-‐DOS
taught
us
to
be
laaaaaaaaaaaaaaaaazy
with
*.*
• A
regex
engine
matches
character
by
character,
and
then
does
backtracking.
• Match
in
as
few
steps
as
possible
Splunk Worldwide Users’ Conference 8 © Copyright Splunk 2011
Thursday, August 18, 11
9. Regexes
in
Splunk
Search Language: “rex”, “erex”, “regex”
Indexing: Filtering data (in|out), line breaking,
timestamp extraction
Field Extraction
Thursday, August 18, 11
10. IFX
• Splunk
has
a
built
in
"interacUve
field
extractor"
• It
can
be
useful.
Give
it
samples
of
data,
and
it
will
a[empt
to
learn
a
regex
and
persist
a
single
field
• It
has
a
limitaUon
of
the
amount
of
events
to
display
in
its
viewer.
• You
might
not
see
your
search
results
when
using
it?
Huh?
Splunk Worldwide Users’ Conference 10 © Copyright Splunk 2011
Thursday, August 18, 11
11. what
if
we
could
use
that
"intelligent"
stuff
IFX
was
doing
but
in
the
search
language
•
Thursday, August 18, 11
Splunk Worldwide Users’ Conference 11 © Copyright Splunk 2011
12. meet
"erex"
• Allows
you
to
give
it
examples,
but
it
works
on
your
search
results
• Allows
you
to
give
it
counterexamples
of
stuff
you
don't
want
to
match
on
• Builds
you
a
proper
rex
command
Splunk Worldwide Users’ Conference 12 © Copyright Splunk 2011
Thursday, August 18, 11
13. ...there's
an
app
for
that.
right?
Splunk Worldwide Users’ Conference 13 © Copyright Splunk 2011
Thursday, August 18, 11
14. Field
Extractor
App
• Imagine
you
could
use
your
mouse,
highlight
fields,
name
them,
persist
them,
go
home
early
and
never
write
regex.
• David
Carasso's
Field
Extractor
app
is
like
a
"workbench
for
field
extracUon"
• Download
it
from
SplunkBase
Splunk Worldwide Users’ Conference 14 © Copyright Splunk 2011
Thursday, August 18, 11
16. the
|
regex
search
command
• Did
you
know
splunk
crushes
all
terms
to
lower
case?
• If
you
need
to
look
for
specific
pa;erns
or
even
words
and
respect
the
case
the
original
events
are
in,
use
|
regex
• index=splunktv|regex
_raw="(MP3|M4A)"
<-‐-‐noMce
this
is
a
case
sensiMve
pa;ern
match.
Splunk Worldwide Users’ Conference 16 © Copyright Splunk 2011
Thursday, August 18, 11
17. What
about
good
ole
Rex?
• Search
Ume
field
extracUons
via
your
own
regexes
-‐-‐
in
the
search
language
• Name
your
fields
• Reuse
everyone
elses
work!
Splunk Worldwide Users’ Conference 17 © Copyright Splunk 2011
Thursday, August 18, 11
18. a
few
more
tricks
for
you
Splunk Worldwide Users’ Conference 18 © Copyright Splunk 2011
Thursday, August 18, 11
20. regex
in
host
extracUon
• Splunk
will
a[empt
to
do
the
right
thing.
Log
source
will
likely
make
it
hard
for
Splunk-‐-‐and
you'll
blame
Splunk
• Props.conf
&
transforms.conf
are
needed
to
properly
extract
hostnames
in
some
cases
(F5
Big-‐IP
and
HP
networking
gear
• Use
default
seungs
in
props.conf
and
use
your
own
seungs
as
well
Splunk Worldwide Users’ Conference 20 © Copyright Splunk 2011
Thursday, August 18, 11
21. priority
boarding
in
props.conf
[source::...a...]
TRANSFORMS-‐ahosts
=
ahostextrac:on
priority
=
1
[source::...z...]
TRANSFORMS-‐zhosts
=
zhostextrac:on
priority
=
99
what
if
the
source
we
were
matching
against
had
the
word
"arizona"
in
it?
It
will
match
both,
right?
Use
"Priority"
to
control
matching.
99
is
higher
than
1.
So
99
is
a
higher
priority.
Yeah,
i
know...
weird.
Splunk Worldwide Users’ Conference 21 © Copyright Splunk 2011
Thursday, August 18, 11
23. Splunk
is
so
smart
except
when
its
not
<policy
id="3">Finjan
HTTPS
policy</policy>
<cp
id="5"
name="AcUve
Content"
display_name="AcUve
Content"/>
<group
id="5002"
cp_id="5"
type="0">Full
profile
-‐
Binary
Behavior</group>
<item
id="28015">Format
error
in
CRL
lastUpdate
field</item>
<item
id="3265747">*.served.com/*</item>
<rule_comment
id="2"
name="Block
cerUficate
validaUon
errors"><!
[CDATA[Block
HTTPS
content
without
a
valid
cerUficate]]></rule_comment>
AUTO-‐KV
pulled
the
“id”
field
out
of
every
event.
Yay!!!
Splunk Worldwide Users’ Conference 23 © Copyright Splunk 2011
Thursday, August 18, 11
24. “id”
is
not
the
field
name
look
closer
Agent
Starling
<policy
id="3">Finjan
HTTPS
policy</policy>
<cp
id="5"
name="AcUve
Content"
display_name="AcUve
Content"/>
<group
id="5002"
cp_id="5"
type="0">Full
profile
-‐
Binary
Behavior</group>
<item
id="28015">Format
error
in
CRL
lastUpdate
field</item>
<rule_comment
id="2"
name="Block
cerUficate
validaUon
errors"><!
[CDATA[Block
HTTPS
content
without
a
valid
cerUficate]]></rule_comment>
We
can
educate
Splunk
on
dynamically
pulling
the
KEY
and
VALUE
with...
Splunk Worldwide Users’ Conference 24 © Copyright Splunk 2011
Thursday, August 18, 11
25. Dynamic
Key
Value
ExtracUon
...but
tailored
for
our
needs
REGEX
for
the
“KEY”
is
<([^=]+)=
<policy
id="3">
Less
than,
followed
by
(anything
that
is
“not
an
equal
sign-‐-‐greedy
match)
<cp
id="5"
followed
by
an
equal
sign
<item
id="28015">
keep
going
dude!
REGEX
for
the
“VALUE”
is
”(
<policy
id="3">
A
quote
(followed
by
anything
that
is
not
a
quote-‐-‐greedy
match)
followed
by
a
<cp
id="5"
quote
followed
by
a
greater
than
sign
<item
id="28015">
Splunk Worldwide Users’ Conference 25 © Copyright Splunk 2011
Thursday, August 18, 11
26. Persist
your
sweet
dynamic
KV
pa[erns
props.conf
&
transforms.conf
required
Create
an
entry
in
props.conf
like
this:
[m86_dynamic_kv]
$1
$2
REPORT-‐m86fields
=
mym86kv
Text
Create
an
entry
in
transforms.conf
like
this:
[mym86kv]
REGEX
=
<([^=]+)="([^"]+)">
FORMAT = $1::$2 <policy
id="3">Finjan
HTTPS
policy</
policy>
Splunk Worldwide Users’ Conference 26 © Copyright Splunk 2011
Thursday, August 18, 11
27. Dang
it!
It
wasn’t
perfect
some
of
our
events
don’t
finish
their
XML
tag
right
a~er
a
quote
Create
an
entry
in
props.conf
like
this:
[m86_dynamic_kv]
$1
$2
REPORT-‐m86fields
=
mym86kv
Text
Create
an
entry
in
transforms.conf
like
this:
[mym86kv]
REGEX
=
<([^=]+)="([^"]+)[^>]+> <rule_comment
id="690"
name="Log
everythin
FORMAT = $1::$2
Image
files"><![CDATA[Logs
all
content
passin
the
system
except
for
......
Splunk Worldwide Users’ Conference 27 © Copyright Splunk 2011
Thursday, August 18, 11
28. Think
you’re
good?
Try
extracUng
the
“service”
field
2011/07/21
19:27:22.071
[(ninja-‐fe96,opensocial,/makeRequest,2011/07/21
19:27:21.978)[ninja-‐be04,auth,Auth2Service.recoverSubject]]
[]
[Auth2Service]
recoverSubject(V1.21.47,OSM:1t7Dg201000:i:
1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,
172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:
1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]]
in
1ms
Your
job
is
to
create
a
mulU-‐valued
field
as
the
“service”
field
exists
mulUple
Umes
in
each
event
Splunk Worldwide Users’ Conference 28 © Copyright Splunk 2011
Thursday, August 18, 11
29. Look
for
the
obvious
pa[erns
2011/07/21
19:27:22.071
[(ela4-‐fe96,opensocial,/makeRequest,2011/07/21
19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]]
[]
[Auth2Service]
recoverSubject(V1.21.47,OSM:1t7Dg201000:i:
1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,
172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:
1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]]
in
1ms
Your
brain
will
tell
you
to
look
for
“anything
a~er
the
first
comma”
a~er
that
le~
bracket
and
before
the
second
comma
Splunk Worldwide Users’ Conference 29 © Copyright Splunk 2011
Thursday, August 18, 11
30. ...and
your
brain
was
wrong.
2011/07/21
19:27:22.071
[(ela4-‐fe96,opensocial,/makeRequest,2011/07/21
19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]]
[]
[Auth2Service]
recoverSubject(V1.21.47,OSM:1t7Dg201000:i:
1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,
172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:
1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]]
in
1ms
This
is
NOT
a
“service”
Dang...
what
are
we
gonna
do
now?
Splunk Worldwide Users’ Conference 30 © Copyright Splunk 2011
Thursday, August 18, 11
31. What
is
common
with
“services”
2011/07/21
19:27:22.071
[(ela4-‐fe96,opensocial,/makeRequest,2011/07/21
19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]]
[]
[Auth2Service]
recoverSubject(V1.21.47,OSM:1t7Dg201000:i:
1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,
172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:
1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]]
in
1ms
They’re
all
alphanumeric
or
“word”
characters
0-‐9A-‐Za-‐z_
Splunk Worldwide Users’ Conference 31 © Copyright Splunk 2011
Thursday, August 18, 11
32. But
what
about
the
preceding
text
2011/07/21
19:27:22.071
[(ela4-‐fe96,opensocial,/makeRequest,2011/07/21
19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]]
[]
[Auth2Service]
recoverSubject(V1.21.47,OSM:1t7Dg201000:i:
1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,
172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:
1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]]
in
1ms
Le~
bracket
followed
by
some
stuff,
followed
by
a
comma..
but
its
not
consistent.
SomeUmes
a
“(“
le~
paren
is
in
there.
Splunk Worldwide Users’ Conference 32 © Copyright Splunk 2011
Thursday, August 18, 11
33. This
is
a
be[er
match
2011/07/21
19:27:22.071
[(ela4-‐fe96,opensocial,/makeRequest,2011/07/21
19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]]
[]
[Auth2Service]
recoverSubject(V1.21.47,OSM:1t7Dg201000:i:
[[(-‐a-‐zA-‐Z0-‐9]+,([a-‐zA-‐Z]+),
1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,
172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:
1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]]
in
1ms
Say
the
matching
paZern
out
loud.
It
will
help
Le~
bracket,
followed
by
anything
in
this
character
list
(greedy).
Followed
by
a
comma,
and
then
create
a
capturing
group
of
text
that
matches
upper
or
lower
case
roman
alphabet-‐-‐
greedy
(as
many
Umes
as
possible).
End
capturing
group,
then
followed
by
a
comma.
Splunk Worldwide Users’ Conference 33 © Copyright Splunk 2011
Thursday, August 18, 11
34. Can’t
be
too
hard
to
extend
it,
right?
2011/07/21
19:27:22.071
[(ela4-‐fe96,opensocial,/makeRequest,2011/07/21
19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]]
[]
[Auth2Service]
recoverSubject(V1.21.47,OSM:1t7Dg201000:i:
[[(-‐a-‐zA-‐Z0-‐9]+,([a-‐zA-‐Z]+),[^[]+[[(-‐
1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,
172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:
a-‐zA-‐Z0-‐9]+,([a-‐zA-‐Z]+),
1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]]
in
1ms
Le~
bracket,
followed
by
anything
in
this
character
list
(greedy).
Followed
by
a
comma,
and
then
create
a
capturing
group
of
text
that
matches
upper
or
lower
case
roman
alphabet-‐-‐greedy
(as
many
Umes
as
possible).
End
capturing
group,
then
followed
by
a
comma.
Followed
by
anything
that
is
NOT
a
Le~
Bracket,
followed
by.....
Splunk Worldwide Users’ Conference 34 © Copyright Splunk 2011
Thursday, August 18, 11
35. Sad
Trombone
This
one
has
four
services
2011/07/21
19:27:27.596
[(ninja4-‐fe29,genie,/handle,131292312,2011/07/21
19:27:27.310)[ninja4-‐
be716,lmt,PbContentService.write<tetherAccountData;default>][ninja4-‐
be05,tether,TetherAccountService.bindAccount][ninja4-‐
be393,auth,Auth2Service.upgradeSubject]]
[]
[Auth2Service]
upgradeSubject(V1.21.49,"INT",[LIM:131292312:s:
1311276361:b8f677d957eb3f7b9622247b72374c791720bc17,true],
{internalAppName=twitter-‐sync},"tether",null)=[Principal[2],[INT:
131292312/twitter-‐sync:
1311276447:df9dd0175bd2e6107c2dfae36dfd9a9dc11f0631,false,20y]]
in
15ms
Splunk Worldwide Users’ Conference 35 © Copyright Splunk 2011
Thursday, August 18, 11
36. Remember
“rex”?
He
devours
data
But
you
can
make
“rex”
very
hungry
and
control
how
much
lunch
he
eats.
By
default,
he
only
gets
“one
helping
of
meat”
Splunk Worldwide Users’ Conference 36 © Copyright Splunk 2011
Thursday, August 18, 11
37. Using
max_match
with
rex
You
limit
or
expand
the
number
of
Umes
it
runs
rex max_match=20 "[[(-a-zA-Z0-9]+,(?<service>[a-zA-Z]+),"
Instead
of
that
last
regex
that
matched
“two”
services,
lets
just
match
one,
and
tell
rex
to
repeat
our
pa[ern
matching
Splunk Worldwide Users’ Conference 37 © Copyright Splunk 2011
Thursday, August 18, 11
38. You
can
persist
this
in
config
files
props.conf
&
transforms.conf
required
Create
an
entry
in
props.conf
like
this:
[ninjasocial]
REPORT-‐ninjafields
=
myepicregex
Create
an
entry
in
transforms.conf
like
this:
[myepicregex]
REGEX
=
[[(-a-zA-Z0-9]+,(?<service>[a-zA-Z]+),
MV_ADD = TRUE
Splunk Worldwide Users’ Conference 38 © Copyright Splunk 2011
Thursday, August 18, 11
39. And
now
for
something
difficult
gaming
logs
-‐
Team
Fortress
L
08/02/2011
-‐
11:46:05:
"The
Administrator<61><BOT><Red>"
killed
"MoreGun<56><BOT><Blue>"
with
"flamethrower"
(attacker_position
"-‐2677
2177
-‐127")
(victim_position
"-‐2555
2323
-‐127")
Splunk Worldwide Users’ Conference 39 © Copyright Splunk 2011
Thursday, August 18, 11
40. I
need
the
data
gaming
logs
-‐
Team
Fortress
L
08/02/2011
-‐
11:46:05:
"The
Administrator<61><BOT><Red>"
killed
"MoreGun<56><BOT><Blue>"
with
"flamethrower"
(attacker_position
"-‐2677
2177
-‐127")
(victim_position
"-‐2555
2323
-‐127")
Splunk Worldwide Users’ Conference 40 © Copyright Splunk 2011
Thursday, August 18, 11
41. Who’s
who?
How
do
we
know
who
did
what
to
whom?
L
08/02/2011
-‐
11:46:05:
"The
Administrator<61><BOT><Red>"
killed
"MoreGun<56><BOT><Blue>"
with
"flamethrower"
(attacker_position
"-‐2677
2177
-‐127")
(victim_position
"-‐2555
2323
-‐127")
Splunk Worldwide Users’ Conference 41 © Copyright Splunk 2011
Thursday, August 18, 11
42. actor actor_id actor_team actor_type
L
08/02/2011
-‐
11:46:05:
"The
Administrator<61><BOT><Red>"
killed
"MoreGun<56><BOT><Blue>"
with
"flamethrower"
(attacker_position
"-‐2677
2177
-‐127")
(victim_position
"-‐2555
2323
-‐127")
actee actee_id actee_type actee_team
Splunk Worldwide Users’ Conference 42 © Copyright Splunk 2011
Thursday, August 18, 11
43. Didn’t
we
see
this
slide
before?
How
do
we
know
who
did
what
to
whom?
L
08/02/2011
-‐
11:46:05:
"The
Administrator<61><BOT><Red>"
killed
"MoreGun<56><BOT><Blue>"
with
"flamethrower"
(attacker_position
"-‐2677
2177
-‐127")
(victim_position
"-‐2555
2323
-‐127")
Splunk Worldwide Users’ Conference 43 © Copyright Splunk 2011
Thursday, August 18, 11
44. See
that
pa[ern?
Remember
“max_match”?
L
08/02/2011
-‐
11:46:05:
"The
Administrator<61><BOT><Red>"
killed
"MoreGun<56><BOT><Blue>"
with
"flamethrower"
(attacker_position
"-‐2677
2177
-‐127")
(victim_position
"-‐2555
2323
-‐127")
Splunk Worldwide Users’ Conference 44 © Copyright Splunk 2011
Thursday, August 18, 11
45. See
that
pa[ern?
Remember
“max_match”?
"The
Administrator<61><BOT><Red>"
"MoreGun<56><BOT><Blue>"
Using
rex
/
mv_add,
lets
capture
it
in
to
some
temporary
“mul9-‐value”
fields
Splunk Worldwide Users’ Conference 45 © Copyright Splunk 2011
Thursday, August 18, 11
46. “Temporary”
MulUValue
Fields
actor_name_z The
Administrator,MoreGun
actor_id_z 61,56
actor_type_z BOT,BOT
actor_team_z Red,Blue
Using
rex
/
mv_add,
lets
capture
it
in
to
some
temporary
“mul9-‐value”
fields
Splunk Worldwide Users’ Conference 46 © Copyright Splunk 2011
Thursday, August 18, 11
47. Evaluate
&
Transform
with
“mvindex”
mul9-‐value
fields
have
an
“posi9on
value”
in
the
array
mvindex 0
1
actor_name_z The
Administrator,MoreGun
actor_id_z 61,
56
actor_type_z BOT,BOT
actor_team_z Red,Blue
Splunk Worldwide Users’ Conference 47 © Copyright Splunk 2011
Thursday, August 18, 11
48. Its
Ume
for
our
fields
to
split
up!
mul9-‐value
fields
have
an
“posi9on
value”
in
the
array
|
eval
actor_name
=
mvindex(actor_name_z,0)|
eval
actee_name
=
mvindex(actor_name_z,1)
actor_name
=
The
Administrator
actee_name
=
MoreGun
Splunk Worldwide Users’ Conference 48 © Copyright Splunk 2011
Thursday, August 18, 11
49. Resources
• regexlib.com
• regular-‐expressions.info
• gskinner.com/RegExr
• Reggy
/
RegExhibit
• RegexBuddy
(JGSo~.com)
Thursday, August 18, 11