8. SPAM
aCacked
to
our
CEO
Eva Chen
CEO & Co-Founder
Trend Micro
9. SPAM
aCacked
to
me
http://rebelrowsers.com/AV8z8s/index.html
http://videospornogratis.com.es/DGx9Zv/index.html
http://newarkpartytents.com/BvFNK66F/index.html
18. New
Approach
for
Cyber
Threat
SoluJon
CDN
/
xSP
Researcher
Intelligence
Honeypot
Web
Crawler
Trend
Micro
Mail
ProtecJon
Trend
Micro
Trend
Micro
Endpoint
ProtecJon
Web
ProtecJon
300+
Million
Worldwide
Sensors
19. SPN Solution Architecture
Processing
Validate
&
Quality
Solu<on
Solu<on
Sourcing
&
Analysis
Create
Solu<on
Assurance
Distribu<on
Adop<on
File
File Reputation Service
Web /
URL
Smart Protection
Customer
Email
Web Reputation Service
Domain
Email Reputation Service
IP
SPN Correlation
Community Intelligence
(Feedback loop)
20. Challenges
We
Are
Faced
6TB
of
data
and
15B
lines
of
logs
received
daily
by
It
becomes
the
Big
Data
Challenge!
21. Overview
–
Smart
Feedback
Data
Source
Akamai (*): URL users accessed
Access
NSC_TmProxy_URLF_002: APP accessed malicious URL
Exposure
NSC_TmProxy_HFS_001: URL hosted suspicious/malicious file
Layer Content
AMSP_TMBP_NSC_001: URL hosted shellcode
BES_001: URL hosted suspicious/malicious content
SAL_001: URL hosted suspicious/malicious content
TMASE_001: Email contains suspicious/malicious content
Infec<on
Layer
VSAPI_001: File detected as suspicious/malicious
CENSUS_001: File executed on endpoint
AEGIS_001: APP with suspicious/malicious behavior
RCA_001: Endpoint infection chain
Dynamic
Layer
CONAN_001: File detected by heuristic rules
CONAN_002: Heuristic rule detection result of a file
DCE_001: Clean result
DRE_001, PEDif_001, LCE_001
22. Feedback
Source
in
Terms
of
Products
Gateway
Consumer SMB Enterprise
Products
Schema ID
Endpoint Endpoint Endpoint
(IMSS/
(Titanium/TIS) (WFBS)
(OSCE)
IWSS)
Akamai
V V
V V
NSC_TmProxy_URLF_002
V V
V
NSC_TmProxy_HFS_001
V (*) V
V
AMSP_TMBP_NSC_001
V (*)
BES_001
V
SAL_001
V
TMASE_001
V
VSAPI_001
V V V
CENSUS_001
V
AEGIS_001
V V
V
RCA_001
V V (*)
CONAN_001
V
CONAN_002
V
DCE_001
V V V
DRE_001
V
PEDif_001
V
LCE_001
V
26. Unique
Endpoint
Counts
by
Industry
(industry
category
feedback
only
from
Enterprise
products)
Notspecified
1,564,471
47.43%
Specified
1,734,141
52.57%
27. SPN
High
Level
Architecture
API
Server/Portal
(SSO)
SPN
Honey
CDN/xSP
Feedback
Pot
Log
Data
Sourcing
Service
Pla]orm
MySPN
PlaSorm
Log
Receiver
Solr
Cloud
Log
Post-‐processing
Web
Pages
Hadoop
Distributed
File
System
(HDFS)
CorrelaJon
Threat
Census
DRR
Pla]orm
Connect
Tracking
Global
Akame
Logging
Object
System
Cache
Adhoc-‐Query
(Pig)
MapReduce
Oozie
HBase
Trend
Message
Exchange
(Message
Bus)
Email
ReputaJon
3rd-‐Party
File
ReputaJon
Web
ReputaJon
Service
Data
Feed
Service
Service
28. Service
Stack
of
SPN
SAL/MKT TS RD Consumer Enterprise
Internal Customer External Customer
Threat Landscape
Risk Management User Experience
Service
Catalogue MagicQ ZDASE Census APT Report Widget
Global Intelligence Network Entity Web Mobile
Correlation
Cloud Infra
Infrastructure
Data Akamai Zone Files FRS WRS
Catalogue
Census Feedback ERS SPN
Cooked Data
Raw Data Feeds
Feeds
29. SPN Ecosystem
API
OLAP System
MySPN Framework Web Frontend
Data Mining
Solr
Sourcing RDB
Adhoc-Query
TME
Oozie
Scribe Pig Hive
Arvo
Protobuf
Flume Streaming MapReduce Engine
Hadoop
HCatalog
Data Inputs Data Outputs
HDFS HBase
OLTP System
Middleware / DB / K-V Stores
Web Frontend
43. Galileo15 Makes it Possible!!!
• 2
observa<ons
from
the
data
– Sparse
connec<on
with
low
diameter
preference
– Incomplete
connec<on
Domain
IP
66.135.202.89
fahrzeugteile.shop.ebay.de 66.135.205.141
shop.ebay.ca 66.135.213.211
66.135.213.215
videogames.shop.ebay.com.au 66.211.160.11
66.211.180.27
Missing edges
43 43
44. Galileo15
Transform
mass
raw
data
into
community
structures
Host Host IP
Domain
66.135.202.89
fahrzeugteile.shop.ebay.de 66.135.205.141
shop.ebay.ca 66.135.213.211
66.135.213.215
videogames.shop.ebay.com.au 66.211.160.11
66.211.180.27
44
58. Why
“Community
that
Fits”?
Domain IP
203.77.186.249
69.164.22.140
Server114.at.youporn.com 69.164.22.153
WTP
69.164.22.154
Server346.at.youporn.com
87.248.203.50
DUL from ERS
Server420.at.youporn.com 87.248.207.141
87.248.210.147
Server730.at.youporn.com Malicious
87.248.211.194
Server923.at.youporn.com 87.248.211.223
Phishing
87.248.212.55
87.248.218.132
58
59. Why
“Community
that
Fits”?
Domain IP
203.77.186.249
69.164.22.140
Server114.at.youporn.com 69.164.22.153
WTP
69.164.22.154
Server346.at.youporn.com
87.248.203.50
DUL from ERS
Server420.at.youporn.com 87.248.207.141
87.248.210.147
Server730.at.youporn.com Malicious
87.248.211.194
Server923.at.youporn.com 87.248.211.223
Phishing
87.248.212.55
87.248.218.132
59
60. Some porn websites are not blocked
but caught by Galileo15
amateurmaturevoyeur.pornblink.com
bareasswhipping.pornblink.com
WTP
desihotpoint.com
freexxxamaturefucking.pornblink.com
Phishing fxxkinsilly.com
goldengatebridgebuilt.pornblink.com 203.77.186.249
hotolderwomenshowingpants.pornblink.com
matureamateurgallerysoftcore.pornblink.com
Malicious
skinnyteenanallesbian.pornblink.com
spermster.com
Pornography
Pornography
60
61. Applications
Clique
Enumeration
Clique
Matching
Clique
Ranking
Domain IP Domain IP Domain IP Domain IP Domain IP
#Cliques
Time
T0 T0+15 T0+30 T0+45 T0+60
61
62. Applications
Clique
Enumeration
Clique
Matching
Clique
Ranking
Domain IP Domain IP Domain IP Domain IP Domain IP
#Cliques
WhiteListing Anomaly detection
Web Hosting
FastFlux
Time
T0 T0+15 T0+30 T0+45 T0+60
62
63. Applications
Domain IP Domain IP Domain IP Domain IP Domain IP
#Cliques
1 Whitelisting 2 Anomaly detection
3 Web Hosting
4 Fast Flux
Time
T0 T0+15 T0+30 T0+45 T0+60
63
66. Summary
• Propose a brand-new community representation
• Provide a powerful graph-based correlation engine
• Reduce 40.3% workload
• Bring huge business value
66
71. Parameters of Clique Enumeration algorithm
L R
§ γ : density of edges of Quasi-Clique
l1
ú |E| ≥ γ |L| |R| l2
r1
r2
§ MinE: Minimum support of each edge l3 r3
ú #E(li,rj) ≥ MinE l4
§ MinL, MaxL : Minimal and maximal number of G(L,R,E)
objects on the left site of a clique
L = { l 1, l 2, l 3, l 4}
ú MinL ≥ |L| ≥ MaxL
R = {r1, r2, r3}
§ MinR, MaxR : Minimum and maximum number of E = {(li, rj)|
objects on the right site of a clique 1≦i≦4,1≦j≦3}
|L| = 4, |R| = 3
MinR ≥ |R| ≥ MaxR
Deg(l1) = 2, Deg(l2) = 3
§ Min_DegL, Min_DegR: Minimum degrees of
objects on the left and right site of a clique,
respectively
ú Deg(li) ≥ Min_DegL ∀li ∈ L; Deg(rj) ≥ Min_DegR ∀rj ∈ R
71
72. SpecificaJon
for
Hadoop
Environment
Number of Machines 40
Machine Type Dell PE2950
CPU QuadCore Xeon 5410 x 2
RAM 4GB (667MHz) x 2
Disk 300 GB SATA 7.2K x 6
OS RHEL AS4, 32 bits
72
83. Big
World,
Big
Data
• Important
numbers
for
WRS
– 8
billions
queries
daily
– 9
hundred
millions
URLs
analyzed
daily
– <
0.01%
daily
URLs
idenJfied
as
malicious
• Finding
needle
in
the
haystack
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
83
84. Processing
Big
Data
0.1 ms per URL
• Content
analysis:
900
million
unique
URLs
/
24
hr
=
10K
URLs
per
second
– Challenge:
How
to
coordinate,
maintain
and
distribute
work
among
large
set
of
machines
(>
500
machines)
?
• Raw
log
analysis:
3
Terabytes
of
data
each
day
– Challenge:
How
to
store
them
in
a
way
that
is
reliable
+
fast
to
retrieve
relevant
data?
– How
to
process
log
(present
+
historical
~
500TB)
to
provide
vital
staJsJcs
and
trend?
Historical
Trend
Vital Present
3 Terabytes Statistics
View
per day
Raw
Log
Anomaly
8 billions URLs
per day
Detection
19K URLs
per day
User
Queries
Malicious
900 million URLs
per day
URLs
Unique Content
URLs
Analysis
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
84
85. Today’s
Agenda
• Discussion
of
the
real-‐world
design
– Constraints
– Requirements
• Sample
of
tools
available
– When
to
use
them?
– How?
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
85
86. What
are
we
trying
to
do
with
Big
Data?
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
86
87. Usage
Triangle
• Historical domain – IP relations
• Historical access pattern
• Known malicious actors
…
• Detect s abnormal behavior
• Groups malicious domains
• Potential malicious URLs
… • Malicious Activities?
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
87
88. Constraints
Triangle
• What data to store?
• How much data to store?
• For How long?
• Readily accessible
• $$$
• Threat coverage
• How fast discovery can
be?
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
88
89. CLS
ObservaJon
• Like
CAP
theorem
where
one
can
only
saJsfy
2
out
of
3
constraints,
one
can
only
saJsfy
2
out
of
3
constraints
when
working
on
threat
discovery.
– (Coverage+,
Latency+):
It
is
impossible
to
achieve
fast
discovery
&
large
coverage
without
an
enormous
data
store
to
provide
the
necessary
informaJon
for
decision
making.
– (Latency+,
Storage+):
By
focusing
on
a
smaller
set
of
URLs,
we
can
provide
fast
discovery
without
need
for
huge
data
store.
– (Coverage+,
Storage+):
By
allowing
longer
discovery
Jme,
we
can
enhance
the
coverage
without
using
a
large
data
store.
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
89
90. It
is
all
about
the
trade-‐off
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
90
91. Two
schools
of
thoughts
(1/2)
• (Storage+,
Latency+)
– ACacks
are
• Wave
in
nature
– Sudden
appearance
– Short
lifespan
• Disposable
– Use
once
and
throw
away
• Regionalized
– Global
epidemic
are
less
common
• Few
– <
0.01%
of
the
daily
unique
URLs
are
malicious
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
91
93. Two
schools
of
thoughts
(2/2)
• (Coverage+,
Latency+)
– History
repeats
itself
• So
does
hacker’s
infrastructure
(not
so
throwaway)
– ProtecJng
coverage
is
essenJal
• Detectable
by
more
thorough
invesJgaJon
with
larger
context
– Future-‐Proof
• Our
soluJon
reflects
past
knowledge
• If
we
don’t
accumulate/adapt
/evolve
our
knowledge,
our
soluJon
will
be
obsolete
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
93
95. It
Boils
Down
to
Streaming
vs.
Batch
Processing
• Streaming
looks
at
queries
in
real-‐Jme
– Filters
out
unneeded
URLs
– Processes
suspicious
URLs
only
– Kava,
S4,
Trend
Messaging
Exchange
• Batch
processing
– Not
real-‐Jme
– Broader
scope
– Hadoop
Map-‐Reduce
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
95
96. Streaming
Big
Data
• URL
and
its
value
are
ephemeral
– Need
to
act
fast
– No
need
to
store
them
• Useful
data
are
far
in
between
– Filter
it
out
• Apply
Unix
Pipe
concept
distributed
style
– Message
oriented
programming
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
96
97. What is Message Oriented
Programming?
2012/11/8
Confidential | Copyright 2012 Trend Micro Inc. 97
98. TradiJonally
• Tightly
Coupled
– Fixed
service
locaJon
– Protocol
specific
– Difficult
to
change/adapt
to
new
business
requirement
• Lack
of
separaJon
between
– Network
handling
– ApplicaJon
logic
2012/11/8
Confidential | Copyright 2012 Trend Micro Inc. 98
102. Is
that
enough?
• Protocol
independence
• LocaJon
independence
– URL
vs.
Channel
ID.
• Direct
vs.
Indirect
ConnecJon
– Replacing
connecJon
to
server
with
connecJon
to
message
bus
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
10
2
103. Further
encapsulaJon
• To
aCach
to
the
message
bus:
Ø message-‐source
|
your-‐app-‐here
|
message-‐sink
Ø Message-‐source
|
app-‐1-‐here
|
app-‐2-‐here
|
message-‐sink
• Just
like
Unix
pipe
concept
– cat
log.txt
|
gawk
‘{print
$1}’
|
sort
-‐u
2012/11/8
Confidential | Copyright 2012 Trend Micro Inc. 10
3
104. Messaging
code
is
as
simple
as
#include
<iostream>
#include
<string>
int
main()
{
std::string
name;
std::cin
>>
name;
std::cout
<<
"Hello!
"
<<
name;
}
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
10
4
105. Conceptually
it
is
sJll
data
flow
• Each
blue
arrow
is
now
a
message
channel
/
queue.
• Each
component
can
be
in
different
locaJon,
and
dynamically
rearranged
with
minimum
effort
2012/11/8
Confidential | Copyright 2012 Trend Micro Inc. 10
5
109. It
is
not
a
pipe
dream
2012/11/8
Confidential | Copyright 2012 Trend Micro Inc. 10
9
110. Scalability
• Wait
we
are
dealing
with
Big
Data
here!
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
11
0
111. Scalability
• Message
bus
becomes
the
boCleneck
– Each
blue
arrow
represents
input/output
to
the
message
bus
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
11
1
112. ParJJoning
Message
Bus
(1/2)
• ParJJon
– Spread
out
channels
across
different
message
servers
– Load
balance
– Avoid
network
boCleneck
– Increase
number
of
channels
system
can
handle
• Because
messaging
encapsulaJon
– Server
selecJon
and
load
balance
are
automaJc.
2012/11/8
Confidential | Copyright 2012 Trend Micro Inc. 11
2
113. ParJJoning
Message
Bus
(2/2)
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
11
3
118. How
WRS
does
it?
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
11
8
119. Big
Data
Tools
• In
House
SoluJon
– Trend
Messaging
Exchange
• Coordinate
and
distribute
works
among
large
set
of
machines
• Enhanced
scalability
&
reliability
• Open
Sourced:
hCps://github.com/trendmicro/tme/wiki
– Lumber
Jack–
Ultra
High
Efficiency
indexing
system
• Structures
log
allowing
for
<
10
seconds
retrieval
of
vital
staJsJcs
and
informaJon
– TradiJonal
scanning
method
requires
>
10
minutes
to
days
– 60
Jmes
savings
in
Jme
• Highly
specialized
for
Trend’s
tasks
• Community
Supported
Projects
– Trend
Customized
Hadoop/Hbase
data
storage
• Involves
with
Hbase
steering
commiCee
– Contribute
to
the
open
sourced
community
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
11
9
120. Big
Data
Begets
Big
data—
aka
Business
Intelligence
• We
have
built
a
large
infrastructure
processing
big
data.
– Big
data
generates
big
data
generates
business
intelligence
– For
example:
8
billion
URLs
flowing
through
the
system
• 8
billion
flowing
through
100
nodes
will
generate
800
billion
entries
in
log
(conservaJvely
esJmaJng)
• Business
intelligence
extracJon
2012/11/8
ConfidenJal
|
Copyright
2012
Trend
Micro
Inc.
12
0