How to Troubleshoot Apps for the Modern Connected Worker
Security data deluge
1. Security
Data
Deluge-‐
Zions
Bank's
Hadoop
Based
Security
Data
Warehouse
Claiming
the
Intersec>on
@
Informa>on
Security
and
Fraud
Brian
Chris>an,
CTO
and
Co-‐Founder,
ZeEaset
Michael
Fowkes,
SVP
and
Director
of
Fraud
Preven>on
and
Security
Analy>cs
,
Zions
Bancorp
2. Security
Data
Warehouse
• A
security
data
warehouse
is
a
massive
database
intended
to
aggregate
event
data
across
your
en>re
enterprise;
for
long
term
large-‐scale
security/fraud
related
analy>cs
• The
u>lity
of
this
system
is
realized
once
the
data
is
normalized
into
a
common
format,
and
mined
by
experts
with
in>mate
understanding
of
the
data
itself
• It’s
also
affordable
to
the
common
company
3. Why
SDW
Today
• “More
data
is
generated
in
3
days
than
in
the
history
of
the
world
to
2003”
–
Eric
Schmidt
• Fraudsters
con>nue
to
innovate
and
leverage
explosive
growth
of
portable
compu>ng
• Fraudsters
con-nue
to
study
“us”
• Through
massive
data
sets
and
comprehensive
analy>c
modeling,
you/we
can
begin
to
study
them
4. SDW
isn’t
a
product
• Security
is
never
a
product
it’s
a
process
• There
are
past
“processes”
that
help
build
the
system
– Key
example
is
SIEM:
SIEM
creates
a
“Big
Data”
problem
for
InfoSec.
Instead
of
dumping
that
data
a]er
60
days,
store
ALL
the
data
in
the
SDW
–
even
the
events
you’re
currently
not
logging
• When
fraud
teams
work
with
security,
the
common
pla`orm
will
accelerate
the
program
5. SDW
Data
Collec>on
• The
SDW
is
intended
to
collect
EVERYTHING
– Everything
in
terms
of
event
data/not
just
security
• SDW
business
analysts
live
by
the
expression
“the
more
data
I
receive,
the
beEer
I
feel”
• All
data
is
created
equal
–
but
data
mined
in
certain
combina>ons
is
more
interes>ng
than
others
–
Trust
but
verify:
This
goes
for
both
automated
controls
as
well
as
human
behavior
6. SDW
System
Availability
• The
system
should
be
easy
to
use
– Average
skilled
labor
to
maintain
the
pla`orm/
cluster
• The
system
must
fault
tolerant
– At
700TB
–
2PB
of
data,
when
a
hard
drives
fail
the
system
should
maintain
its
process
• The
SDW
should
grow
as
needed
without
performance
degrada>on
– Affordable
to
meet
tomorrow’s
demand
7. SDW
is
used
for
Mining
• SDW
is
where
Informa>on
Security
and
Fraud
teams
meet
to
solve
problems
– Most
InfoSec
and
Fraud
don’t
communicate
– Silos
of
data
are
collapsed
into
a
single
view
• The
SDW
is
a
laboratory
–
Not
SIEM
– Are
there
indicators
that
other
users/accounts
are
suscep>ble
to
fraud/aEack
– Run
the
model
through
the
en>re
database
to
account
for
similar
aEributes
8. What
is
a
Security
Data
Warehouse
• A
Security
Data
Warehouse
is
a
massive
mineable
database
• The
system
is
horizontally
scalable
to
Petabytes
of
data
• The
amount
of
data
available
for
analysis
is
historical
and
many
years
old
• Its
affordable
to
the
common
person!
– Risk
Management
are
the
common
people
in
IT
16. SIEM
Issues
• Rigid
data
models
• Did
not
deal
well
with
unstructured
data
• RDMS
performance
with
large
data
sets
• Limited
ways
to
interact
with
data
19. Why
Hadoop
&
Hive
• Scalability
/
performance
• Manage
resources
– Fair
Scheduler
• Fault
tolerance
• SQL
like
language
(HiveQL)
– Most
of
the
staff
had
SQL
skills
• Easy
applica>on
/
tool
integra>on
– ODBC
/
JDBC
driver
• Fast
data
inges>on
• Can
handle
unstructured
data
• Flexibility
&
extensibility
– UDF’s
– Streaming
jobs
20.
21.
22. ETL
Philosophy
• “Pre-‐mine”
Intelligence
during
the
ETL
process
– Add
value
at
>me
of
capture
(enrichment)
– Quickly
analyze
important
data
– Automate
>me-‐sensi>ve
ac>vi>es
• Load
all
data…no
filtering
of
data
that
will
be
loaded
into
the
warehouse
– You
don’t
know
what
you
will
want
tomorrow
– Leverage
file
compression,
rcfiles,
and
table
par>>oning
to
address
storage
/
performance
issues
• Store
2
years
worth
of
historical
data
23.
24. The
Team
• Data
Scien>st
• Data
Analyst
• LOB
User
• Data
Engineer
• Data
Pla`orm
Administrator
28. Data
Examples
• Web
server
logs
• Customer
database(s)
• OS
logs
• Fraud
model
alerts
• DB
logs
• Mainframe
ac>vity
logs
• Proxy
server
logs
• HTTP
(customer
Internet
ac>vity)
• SPAM
filter
logs
logs
• A/V
events
• ATM/POS
transac>ons
• DLP
events
• Credit
card
transac>ons
• VPN
logs
• G/L
logs
• DNS
logs
• ACH,
Wire,
and
Deposit
• Firewall
logs
transac>ons
• On-‐line
banking
applica>on
logs
• E-‐mail
logs
• Deposits
/
savings
/
>me
account
• Router
/
switch
logs
daily
balances
• IP
blacklists
• Vulnerability
scan
results
Over
120
data
sets
29. How
users
interact
with
the
data
• Data
Scien>st
– KNIME
– R
– Tableau
• Data
Analyst
– SQuirreL
SQL
– Hive
command
line
• LOB
User
– Datameer
– Custom
web
app
for
common
queries
• Parameterized
queries
• Output
to
HTML
table
or
tab
delimited
file
30. Sample
firewall
log
query
SELECT
collect_set(src_ip)
as
src_ips,
dst_ip,
protocol,
ac>on,
rule_uid,
collect_set(rule)
as
rules,
count
(*)
as
log_entry_count
FROM
firewall_logs
WHERE
day
=
‘2012-‐05-‐26’
AND
dst
=
‘1.1.1.1’
GROUP
BY
ac>on,
dst,
proto,
service,
rule_uid
ORDER
BY
dst_ip,
protocol,
rule_uid
31. Dealing
with
unstructured
data
Via
Perl:
while
(<INFILE>)
{
if
(
$_
=~
/s+w+s+Transac>onsInquirys+/)
{
chomp
$_;
my
($ts,
$ip,
$port,
$payload)
=
split(/|/,
$_);
if
($payload
=~
/
s+Account:s+d+s+Appl:s+(w+)s+/)
{
$product_code
=
$1;
}
if
($payload
=~
/
s+Account:s+(d+)s+Appl:s+w+s+/)
{
$account
=
$1;
}
if
($payload
=~
/
s+w+s+Transac>ons+Inquirys+(w+)s+/)
{
$bank
=
$1;
}
if
($payload
=~
/
s+(w+)s+Transac>ons+Inquirys+/)
{
$agent
=
$1;
}
print
OUTFILE
“$ts|$ip|$port|$product_code|$account|$bank|$agentn”;
}
}
32. Dealing
with
unstructured
data
Via
Hive:
SELECT
ts,
ip
,
port
,
regexp_extract(payload,
's+Account:s+d+s+Appl:s+(w+)s+',
1)
as
product_code,
regexp_extract(payload,
's+Account:s+(d+)s+Appl:s+w+s+',
1)
as
account,
regexp_extract(payload,
's+w+s+Transac>ons+Inquirys+(w+)s+',
1)
as
bank,
regexp_extract(payload,
's+(w+)s+Transac>ons+Inquirys+',
1)
as
agent
FROM
mainframe_logs
WHERE
day
=
'2012-‐05-‐26'
AND
payload
rlike
's+w+s+Transac>onsInquirys+’
34. Visualiza>on
example
Fraud
Events
Per
Day
JAN
FEB
MAR
APR
MAY
JUN
JUL
AUG
SEP
OCT
NOV
DEC
Sun
Mon
Tue
2010
Wed
Thu
Fri
Sat
Sun
Mon
Tue
2011
Wed
Thu
Fri
Sat
Sun
Mon
Tue
2012
Wed
Thu
Fri
Sat