SlideShare a Scribd company logo
1 of 125
Download to read offline
Cloud Practices in Trend Micro
Chung-Tsai Su, Ray Liao
Core Tech
Trend Micro Inc.
2012/11/06
Agenda	

•  What	
  is	
  happening	
  in	
  Real	
  World	
  
•  Big	
  Data	
  in	
  Trend	
  Micro	
  
•  Experience	
  Sharing	
  
    –  Galileo15	
  
    –  WRS	
  
•  Discussions	
  
•  Q&A	
  
What’s happening in the real Real World
Real	
  World	
  
http://www.taiwannews.com.tw/etn/news_content.php?id=2017595
http://www.17inda.com/html/3/article-1937.html
SPAM	
  aCacked	
  to	
  our	
  CEO	
  


                                          Eva Chen
                                          CEO & Co-Founder
                                          Trend Micro
SPAM	
  aCacked	
  to	
  me	
  




                                  http://rebelrowsers.com/AV8z8s/index.html




    http://videospornogratis.com.es/DGx9Zv/index.html



                    http://newarkpartytents.com/BvFNK66F/index.html
YAHOO	
  攝影聯合會	
  
http://www.bnext.com.tw/article/view/cid/103/id/24959
Big Data in Trend Micro
Smart	
  ProtecJon	
  Network	
  (SPN)	
  




  Date: 2012/09/25
New	
  Approach	
  for	
  Cyber	
  Threat	
  SoluJon	
  
                         CDN	
  /	
  xSP	
                                         Researcher	
  
                                                                                   Intelligence	
  




   Honeypot	
  


                                                                                                      Web	
  Crawler	
  




           Trend	
  Micro	
  
           Mail	
  ProtecJon	
  
                                                                             Trend	
  Micro	
  
                                               Trend	
  Micro	
  	
          Endpoint	
  ProtecJon	
  
                                               Web	
  ProtecJon	
  

                                   300+	
  Million	
  Worldwide	
  Sensors	
  
SPN Solution Architecture
                       Processing   	
         Validate	
  &	
         Quality	
        Solu<on	
  
                                                                                                	
        Solu<on	
  
                                                                                                                  	
  
          Sourcing
                 	
                       &	
  Analysis
                                   	
        Create	
  Solu<on   	
   Assurance   	
   Distribu<on   	
   Adop<on	

 File
                                           File Reputation Service
Web /
URL




                                                                                                                         Smart Protection



                                                                                                                                             Customer
Email
                                     Web Reputation Service

Domain

                                           Email Reputation Service
  IP




                                              SPN Correlation

                                                                                         Community Intelligence
                                                                                           (Feedback loop)
Challenges	
  We	
  Are	
  Faced	
  
    6TB	
  of	
  data	
  and	
  15B	
  lines	
  of	
  logs	
  received	
  daily	
  by	
  	
  




                    It	
  becomes	
  the	
  Big	
  Data	
  Challenge!	
  
Overview	
  –	
  Smart	
  Feedback	
  Data	
  Source	

                        Akamai (*): URL users accessed
             Access	
   NSC_TmProxy_URLF_002: APP accessed malicious URL	
 Exposure	
             NSC_TmProxy_HFS_001: URL hosted suspicious/malicious file
   Layer Content	
       	
               AMSP_TMBP_NSC_001: URL hosted shellcode
                        BES_001: URL hosted suspicious/malicious content
                        SAL_001: URL hosted suspicious/malicious content
                        TMASE_001: Email contains suspicious/malicious content	

   Infec<on	
  Layer
                   	
   VSAPI_001: File detected as suspicious/malicious	

                        CENSUS_001: File executed on endpoint
                        AEGIS_001: APP with suspicious/malicious behavior
                        RCA_001: Endpoint infection chain
   Dynamic	
  Layer
                  	
    CONAN_001: File detected by heuristic rules
                        CONAN_002: Heuristic rule detection result of a file
                        DCE_001: Clean result
                        DRE_001, PEDif_001, LCE_001
Feedback	
  Source	
  in	
  Terms	
  of	
  Products
                                                                 Gateway
                           Consumer        SMB      Enterprise
                                                                 Products
       Schema ID
           Endpoint     Endpoint    Endpoint
                                                                  (IMSS/
                        (Titanium/TIS)   (WFBS)
      (OSCE)
                                                                  IWSS)
Akamai 
                      V             V
          V           V
NSC_TmProxy_URLF_002
         V             V
          V
NSC_TmProxy_HFS_001
        V (*)           V
          V
AMSP_TMBP_NSC_001
          V (*)
BES_001
                      V
SAL_001
                      V
TMASE_001
                                                          V
VSAPI_001
                    V             V           V
CENSUS_001
                   V

AEGIS_001
                    V             V
          V

RCA_001
                      V           V (*)
CONAN_001
                    V
CONAN_002
                    V
DCE_001
                      V             V           V
DRE_001
                      V
PEDif_001
                    V
LCE_001
                      V
Feedback	
  Volumes
NSC_TmProxy_URLF_002

           AEGIS_001

           VSAPI_001

          CONAN_002

          TMASE_001

             DCE_001

         CENSUS_001

          CONAN_001

            RCA_001

             LCE_001

             SAL_001

 NSC_TmProxy_HFS_001

             DRE_001

             BES_001

 AMSP_TMBP_NSC_001

            PEDif_001

                        1   10   100   1,000   10,000   100,000   1,000,000   10,000,000 100,000,000
Feedback	
  StaJsJcs	



                  Unique GUID
            ﹪

 Housecall
                 275,934
     1.64%

 Consumer
               12,801,624
 76.09%

 Enterprise
              3,747,248
 22.27%
 
                       16,824,806
 
Unique	
  Endpoints	
  Count	
  by	
  Products
Unique	
  Endpoint	
  Counts	
  by	
  Industry	
  	
  
(industry	
  category	
  feedback	
  only	
  from	
  Enterprise	
  products)	




                                                          Notspecified
   1,564,471
 47.43%

                                                          Specified
      1,734,141
 52.57%
SPN	
  High	
  Level	
  Architecture
                                                                                                                               API	
  Server/Portal	
  (SSO)	
  
                              SPN	
                        Honey	
                        CDN/xSP	
  
                            Feedback	
                      Pot	
                           Log	
  
Data	
  Sourcing	
  




                                                                                                                                   Service	
  Pla]orm	
  

                                                                                                                                  MySPN	
  PlaSorm	
  
                                                      Log	
  Receiver	
  

                                                                                                                                       Solr	
  Cloud	
  
                                                 Log	
  Post-­‐processing	
  


                                                                                                                                                                                      Web	
  
                                                                                                                                                                                      Pages	
                                     Hadoop	
  Distributed	
  File	
  System	
  	
  (HDFS)	
  

                          CorrelaJon	
                                           Threat	
  
                                                       Census	
                                        DRR	
  
                           Pla]orm	
                                            Connect	
  
                                                                                                                                                                Tracking	
            Global	
  
                                                                                                                                            Akame	
             Logging	
             Object	
  
                                                                                                                                                                System	
              Cache	
  
                                                  Adhoc-­‐Query	
  (Pig)	
  

                         MapReduce	
                          Oozie	
                           HBase	
               Trend	
  Message	
  Exchange	
  (Message	
  Bus)         	
  

                       Email	
  ReputaJon	
               3rd-­‐Party	
  	
               File	
  ReputaJon	
  	
                                          Web	
  ReputaJon	
  	
  
                           Service	
                      Data	
  Feed	
                          Service	
                                                   Service	
  
Service	
  Stack	
  of	
  SPN	
  

                  SAL/MKT        TS          RD        Consumer         Enterprise
                        Internal Customer                    External Customer

                   Threat Landscape	
       Risk Management        User Experience
  Service
  Catalogue      MagicQ       ZDASE          Census        APT     Report    Widget
                  Global Intelligence         Network Entity        Web      Mobile


                                             Correlation
  Cloud Infra
                                            Infrastructure


  Data            Akamai        Zone Files              FRS               WRS
  Catalogue
                 Census          Feedback              ERS                SPN

                                                               Cooked Data
                       Raw Data Feeds	
                        Feeds
SPN Ecosystem
                                                                        API
                                                                              OLAP System
                                 MySPN Framework                                 Web Frontend

                                    Data Mining
                                               Solr

Sourcing                                                                             RDB
                                      Adhoc-Query
                         TME
                                                                Oozie
   Scribe                                Pig          Hive
            Arvo
            Protobuf
   Flume    Streaming                 MapReduce Engine


                        Hadoop
                                         HCatalog
Data Inputs                                                                   Data Outputs
                                 HDFS                   HBase


                        OLTP System
                               Middleware / DB / K-V Stores


                                      Web Frontend
Experience Sharing
Galileo15
Chung-­‐Tsai	
  Su,	
  Spark	
  Tsao,	
  	
  
Wynne	
  Chu,	
  and	
  Ray	
  Liao	
  
20 years ago, a young man said:




                                t
                    L et’s figh
                       ad guys
                                 	
                     b
At that time, bad guys appear 1 by 1

1 on 1




                                         34
In recent years, bad guys mutate themselves …

    1 on 10




                                            35
Nowadays, bad guys adopt community attack …




                      SuPaWind




  We need a community-based solution
                                         36
Challenges

Solution

Applications



               37
Challenges

 Solution

 Applications



                38
Huge amount of mass data



                                           Number	
                 Size	
       Source	
    Category	
           (thousands/day)	
          (MB/day)	


          SPN	
   Feedback log	
                      17,000	
         11,000	

         WRS	
                   Web log
                   Web	
  log	
        4,500,000
                                             4,500,000	
 1,500,000
                                                             1,500,000	
                  GPServer Crawl	
                  300,000	
             N/A	

          FRS	
    File sourcing	
                          30	
       14,000	

                    Honeypot	
                       78,000	
         200,000	
          ERS	
                  IP level Queries	
              1,200,000	
          46,000	




39                                                                                39
What is the best data structure
to describe “Community”?


        Hash	
       Clique	
     Tree	




        Sequence	
                Clique	




                                             40
Clique	


Botnet	



           facevook.com	




                  facebouk.com	
                                              Fast-flux	
                            Phishing	
                 facenook.com	
                                                    41
NP Hard




42             42
Galileo15 Makes it Possible!!!

• 2	
  observa<ons	
  from	
  the	
  data	
  
   –  Sparse	
  connec<on	
  with	
  low	
  diameter	
  preference	
  
   –  Incomplete	
  connec<on	
  
   	
  
                                   Domain	
         IP	
                                                           66.135.202.89
          fahrzeugteile.shop.ebay.de                       66.135.205.141
                       shop.ebay.ca                        66.135.213.211
                                                           66.135.213.215
       videogames.shop.ebay.com.au                         66.211.160.11
                                                           66.211.180.27

                                                       Missing edges	
  43                                                                        43
Galileo15	
  
      Transform	
  mass	
  raw	
  data	
  into	
  community	
  structures	

                           Host           Host IP	
                           Domain	
                                                66.135.202.89
   fahrzeugteile.shop.ebay.de                   66.135.205.141
                shop.ebay.ca                    66.135.213.211
                                                66.135.213.215
videogames.shop.ebay.com.au                     66.211.160.11
                                                66.211.180.27
                                                                              44
Challenges

Solution

 Applications



                45
Architecture of Galileo15




     Clique             Clique    Clique
   Enumeration         Matching   Ranking




                                            46
Architecture of Galileo15
             Clique	
         Clique	
       Clique
                             Matching	
  
       	
  Enumeration	
                    Ranking




                                                      47
Architecture of Galileo15
         Clique              Clique	
       Clique
                            Matching	
     Ranking
       Enumeration




                                                     48
Architecture of Galileo15
         Clique               Clique             Clique
       Enumeration                              Ranking
                             Matching
                     Time0              Time1




                                                          49
Architecture of Galileo15
         Clique               Clique             Clique
       Enumeration                              Ranking
                             Matching
                     Time0              Time1




                                                          50
Architecture of Galileo15
         Clique               Clique      Clique
       Enumeration           Matching
                                          Ranking

                                          •  Static Rank

                     Time0        Time1




                                          •  Dynamic
                                          Rank
                                            1



                                            2




                                                           51
Architecture of Galileo15
      Clique               Clique              Clique
    Enumeration           Matching            Ranking

                                     •  Static Rank

                  Time0   Time1



                                     •  Dynamic
                                     Rank
                                       1



                                       2




                                                        52
Clique
     Enumeration        Reduces workload


        1 hour                 Hadoop	
          Cliques
Web browsing log	




   180 million logs                           700,000 cliques	
                               7 reducers	


                        < 5 minutes
Runs at WRS ALPS Env.
                         40.3%
40 Machines	
                                                     53
Clique
  Matching        Saves computational consumption

   Run time
               > 1 day




                              90.8%
                               20 mins
15 mins	
                                     < 2 mins

              Brute force	
   Hash-based	
    Multi-layer
                                             Hash-based	

                                                            54
Challenges

 Solution

Applications



               55
Why	
  “Community	
  	
  that	
  Fits”?	
  	
  	

                         Domain                     IP




    Server420.at.youporn.com                             87.248.207.141




                                                                          56
Why	
  “Community	
  	
  that	
  Fits”?	
  	
  	

                          Domain                    IP

                                                         203.77.186.249
                                                         69.164.22.140
    Server114.at.youporn.com                             69.164.22.153
                                                         69.164.22.154
    Server346.at.youporn.com
                                                         87.248.203.50
    Server420.at.youporn.com                             87.248.207.141
                                                         87.248.210.147
    Server730.at.youporn.com
                                                         87.248.211.194
    Server923.at.youporn.com                             87.248.211.223
                                                         87.248.212.55
                                                         87.248.218.132



                                                                          57
Why	
  “Community	
  	
  that	
  Fits”?	
  	
  	

                          Domain                    IP

                                                         203.77.186.249
                                                         69.164.22.140
    Server114.at.youporn.com                             69.164.22.153
                                                                          WTP
                                                         69.164.22.154
    Server346.at.youporn.com
                                                         87.248.203.50
                                                                          DUL from ERS
    Server420.at.youporn.com                             87.248.207.141
                                                         87.248.210.147
    Server730.at.youporn.com                                              Malicious
                                                         87.248.211.194
    Server923.at.youporn.com                             87.248.211.223
                                                                          Phishing
                                                         87.248.212.55
                                                         87.248.218.132



                                                                                      58
Why	
  “Community	
  	
  that	
  Fits”?	
  	
  	

                          Domain                    IP

                                                         203.77.186.249
                                                         69.164.22.140
    Server114.at.youporn.com                             69.164.22.153
                                                                          WTP
                                                         69.164.22.154
    Server346.at.youporn.com
                                                         87.248.203.50
                                                                          DUL from ERS
    Server420.at.youporn.com                             87.248.207.141
                                                         87.248.210.147
    Server730.at.youporn.com                                              Malicious
                                                         87.248.211.194
    Server923.at.youporn.com                             87.248.211.223
                                                                          Phishing
                                                         87.248.212.55
                                                         87.248.218.132



                                                                                      59
Some porn websites are not blocked
but caught by Galileo15
                      amateurmaturevoyeur.pornblink.com
                      bareasswhipping.pornblink.com
     WTP
                      desihotpoint.com
                      freexxxamaturefucking.pornblink.com

  Phishing            fxxkinsilly.com
                      goldengatebridgebuilt.pornblink.com             203.77.186.249
                      hotolderwomenshowingpants.pornblink.com
                      matureamateurgallerysoftcore.pornblink.com
  Malicious
                      skinnyteenanallesbian.pornblink.com
                      spermster.com




                                                                   Pornography


        Pornography




                                                                                       60
Applications	
                           Clique
                                       Enumeration
                                                          Clique
                                                         Matching
                                                                              Clique
                                                                             Ranking


           Domain   IP   Domain   IP    Domain   IP   Domain   IP   Domain     IP
#Cliques




                                                                                       Time

              T0          T0+15         T0+30         T0+45         T0+60
                                                                                        61
Applications	
                           Clique
                                       Enumeration
                                                            Clique
                                                           Matching
                                                                                Clique
                                                                               Ranking


           Domain   IP   Domain   IP    Domain   IP     Domain   IP   Domain     IP
#Cliques
    WhiteListing                                      Anomaly detection
    Web Hosting

   FastFlux




                                                                                         Time

              T0          T0+15         T0+30            T0+45        T0+60
                                                                                          62
Applications	
  
           Domain   IP        Domain   IP   Domain   IP   Domain   IP   Domain   IP
#Cliques

           1   Whitelisting                                             2   Anomaly detection




           3   Web Hosting




           4   Fast Flux




                                                                                      Time

               T0              T0+15        T0+30         T0+45         T0+60
                                                                                        63
More?




        64
History Evolution


                                              Clique



                                       Sequence

                                Clustering

                      Classification


1980   1990    2000                    2010
                                                   65
Summary
•  Propose a brand-new community representation

•  Provide a powerful graph-based correlation engine

•  Reduce 40.3% workload

•  Bring huge business value




                                                       66
Q&A


      67
ALGORITHM

            68
Clique Enumeration algorithm (1/2)
      input      map                      reduce             output
  Domain1, IP1
  Domain2, IP2   map   Shuffling by key
  Domain3, IP3                                     Domain1 ; IP1,1 , IP1,2, …
                                          reduc    Domain2 ; IP2,1 , IP2,2, …
                                            e      Domain3 ; IP3,1 , IP3,2, …

                 map
                                                   Domaini ; IPi,1 , IPi,2 , …
                                          reduc    Domaini+1; IPi+1,1, IPi+1,2, …
                                            e      Domaini+2; IPi+2,1, IPi+2,2, …

                 map
                                                   Domainj ; IPj,1 , IPj,2 , …
                                          reduc    Domainj+1; IPj+1,1, IPj+1,2, …
                                            e      Domainj+2; IPj+2,1, IPj+2,2, …

                         Sorting by key
                 map
  Domainn, IPn

                                                                                    69
Clique Enumeration algorithm (2/2)
               input              map                        reduce   output
 Domain1 ; IP1,1 , IP1,2, …
 Domain2 ; IP2,1 , IP2,2, …             Shuffling by key
                                  map
 Domain3 ; IP3,1 , IP3,2, …

                                                             reduce
 Domaini ; IPi,1 , IPi,2 , …
 Domaini+1; IPi+1,1, IPi+1,2, …
                                  map
 Domaini+2; IPi+2,1, IPi+2,2, …

                                                             reduce
 Domainj ; IPj,1 , IPj,2 , …
 Domainj+1; IPj+1,1, IPj+1,2, …             Sorting by key
                                  map
 Domainj+2; IPj+2,1, IPj+2,2, …




                                                                               70
Parameters of Clique Enumeration algorithm	
                                                                                 L   R
 §  γ : density of edges of Quasi-Clique
                                                                            l1
    ú  |E| ≥ γ |L| |R|                                                     l2
                                                                                         r1
                                                                                         r2
 §  MinE: Minimum support of each edge                                     l3           r3
    ú  #E(li,rj) ≥ MinE                                                    l4

 §  MinL, MaxL : Minimal and maximal number of                              G(L,R,E)
    objects on the left site of a clique
                                                                     L = { l 1, l 2, l 3, l 4}
    ú  MinL ≥ |L| ≥ MaxL
                                                                     R = {r1, r2, r3}
 §  MinR, MaxR : Minimum and maximum number of                      E = {(li, rj)|
    objects on the right site of a clique                            1≦i≦4,1≦j≦3}
                                                                     |L| = 4, |R| = 3
     MinR ≥ |R| ≥ MaxR
                                                                     Deg(l1) = 2, Deg(l2) = 3
 §  Min_DegL, Min_DegR: Minimum degrees of
    objects on the left and right site of a clique,
    respectively
    ú    Deg(li) ≥ Min_DegL ∀li ∈ L; Deg(rj) ≥ Min_DegR ∀rj ∈ R	
                                                                                              71
SpecificaJon	
  for	
  Hadoop	
  Environment	
  



               Number of Machines             40
                 Machine Type            Dell PE2950
                     CPU            QuadCore Xeon 5410 x 2
                     RAM              4GB (667MHz) x 2
                      Disk           300 GB SATA 7.2K x 6
                      OS               RHEL AS4, 32 bits



72	
  
Environment for POC




              73	
  
Time consumption on
             Clique Enumeration algorithm (1/4)	
Time(Sec.)




                                #Reducers
                                   74	
  
Time	
  consumpJon	
  on	
  	
  
         Clique	
  EnumeraJon	
  algorithm	
  (2/4)	
         Number of
          Reducers 1st mapper 1st reducer 2nd mapper 2nd reducer Total time
                  1         27       1201          52          97      1377
                  2         27        556          27          54       664
                  3         27        357          18          39       441
                  4         27        306          15          33       381
                  5         27        249          12          30       318
                  6         27        225          12          27       291
                  7         27        195           9          24       255
                  8         27        193           9          23       252
                  9         27        178           9          22       236
                 10         27        165           9          21       222

75	
  
Time consumption on
     Clique Enumeration algorithm (3/4)	
                      4,000,000,000                                          6000
                                          #l ogs
                      3,500,000,000       T i me
                                                                             5000

                      3,000,000,000
     Number of logs




                                                                                    Time (sec.)
                                                                             4000
                      2,500,000,000


                      2,000,000,000                                          3000


                      1,500,000,000
                                                                             2000

                      1,000,000,000

                                                                             1000
                       500,000,000


                                 0                                           0
                                      1            2            4   8   16



76                                                     Hours
                                                       76	
  
Time consumption on
  Clique Enumeration algorithm (4/4)	

                                                1st    1st   2nd    2nd   Total
Hours
   #records
      #cliques
 Mappers
                                                map
 reduce
 map
 reduce
 time

    1
   182,642,849
     730,651
       416
     27
    195
    9
    24
    293

    2
   375,836,783
 1,008,351
         849
     27
    300
   15
    33
    505

    4
   763,789,635
 1,323,948
     1717
        27
    739
   24
    57
    990

    8
 1,556,210,147
 1,834,466
     3466
        27
   1810
   36
    84
   2270

   16
 3,773,804,326
 2,518,523
     8280
        27
   4203
   69
   188
   5304


                                     77	
  
Multi-Layer Hash-based Matching Algorithm

 T0	
                                           T1	
               Hash table	
                                  1       l1           r1

                                  2

                 Size=1
                   l1
                                                       r1
                                          l2
                                  3                    r2
                                  4       l3           r3
                                           l4
                                      5
                Size=2~5

                                          l1
                                   6      l2
                                                       r1
                                                       r2
                 Size>6
           7      l3
                                                       r3
                                          l4
                                                       r4
                                          l5
                                          l6           r5




                                                            78
Community Matching Algorithm (1/2)	
     Algorithm       
                  Time	
  (sec.)
                                                     	
          #Clique-­‐pairs 	
  
                   Brute	
  force	
         129,630       	
       483,919	
  
                Hash-­‐based     	
               1,194   	
       483,919	
  
  Multi-­‐layer	
  hash-­‐based  	
                 110   	
       424,213  	
  
  Time0                   Time1




                                                 90.8%
                                                                                        79
Community	
  Matching	
  Algorithm	
  (2/2)	
                         1000000
                                                           Hash-based
                                                           Multi-layer hash-based
Number of Clique-pairs




                         100000

                          10000

                           1000

                            100

                             10

                              1
                                   0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%


80
                                                     Similarity
Experience Sharing
Web Reputation Service (WRS)
Web	
  ReputaJon	
  System	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     82	
  
Big	
  World,	
  Big	
  Data	
  
         •  Important	
  numbers	
  for	
  WRS	
  
                –  8	
  billions	
  queries	
  daily	
  
                –  9	
  hundred	
  millions	
  URLs	
  analyzed	
  daily	
  
                –  <	
  0.01%	
  daily	
  URLs	
  idenJfied	
  as	
  malicious	
  
                     •  Finding	
  needle	
  in	
  the	
  haystack	
  




2012/11/8	
         ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     83	
  
Processing	
  Big	
  Data	
                                                                                                          0.1 ms per URL	


           •  Content	
  analysis:	
  900	
  million	
  unique	
  URLs	
  /	
  24	
  hr	
  =	
  10K	
  URLs	
  per	
  second	
  	
  
                 –  Challenge:	
  How	
  to	
  coordinate,	
  maintain	
  and	
  distribute	
  work	
  among	
  large	
  set	
  of	
  machines	
  
                    (>	
  500	
  machines)	
  ?	
  
           •  Raw	
  log	
  analysis:	
  	
  3	
  Terabytes	
  of	
  data	
  each	
  day	
  
                 –  Challenge:	
  How	
  to	
  store	
  them	
  in	
  a	
  way	
  that	
  is	
  reliable	
  +	
  fast	
  to	
  retrieve	
  relevant	
  data?	
  
                 –  How	
  to	
  process	
  log	
  (present	
  +	
  historical	
  ~	
  500TB)	
  to	
  provide	
  	
  
                    	
  vital	
  staJsJcs	
  and	
  trend?	
  
                                                                                                                                                       Historical
                                                                                                                                                        Trend	
                                                                                                             Vital                                      Present
                                                                               3 Terabytes                 Statistics 	
                                    View	
                                                                               per day	
                                                                                    Raw
                                                                                      Log	
                Anomaly
       8 billions URLs
       per day 	
                                                                                          Detection	
                                   19K URLs
                                                                                                                                                         per day	
                 User
                Queries	
                                                                                                                              Malicious
                                                                       900 million URLs
                                                                       per day 	
                                                                                                                                                           URLs	
                                                                               Unique                       Content
                                                                               URLs	
                       Analysis	
2012/11/8	
           ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
        84	
  
Today’s	
  Agenda	

         •  Discussion	
  of	
  the	
  real-­‐world	
  design	
  
                –  Constraints	
  	
  
                –  Requirements	
  


         •  Sample	
  of	
  tools	
  available	
  
                –  When	
  to	
  use	
  them?	
  
                –  How?	
  	
  




2012/11/8	
        ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     85	
  
What	
  are	
  we	
  trying	
  to	
  do	
  with	
  Big	
  Data?	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     86	
  
Usage	
  Triangle	


                                                                                              •  Historical domain – IP relations
                                                                                              •  Historical access pattern
                                                                                              •  Known malicious actors
                                                                                              …


•  Detect s abnormal behavior
•  Groups malicious domains
•  Potential malicious URLs
…                                                                                                              •  Malicious Activities?




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     87	
  
Constraints	
  Triangle	

                                                                                                 •  What data to store?
                                                                                                 •  How much data to store?
                                                                                                 •  For How long?
                                                                                                 •  Readily accessible
                                                                                                 •  $$$	




                •  Threat coverage	
                                                                       •  How fast discovery can
                                                                                                           be?




2012/11/8	
        ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     88	
  
CLS	
  ObservaJon	
         •  Like	
  CAP	
  theorem	
  where	
  one	
  can	
  only	
  saJsfy	
  2	
  out	
  of	
  3	
  
            constraints,	
  one	
  can	
  only	
  saJsfy	
  2	
  out	
  of	
  3	
  constraints	
  when	
  
            working	
  on	
  threat	
  discovery.	
  	
  

                –  (Coverage+,	
  Latency+):	
  	
  It	
  is	
  impossible	
  to	
  achieve	
  fast	
  discovery	
  &	
  
                   large	
  coverage	
  without	
  an	
  enormous	
  data	
  store	
  to	
  provide	
  the	
  
                   necessary	
  informaJon	
  for	
  decision	
  making.	
  	
  

                –  (Latency+,	
  Storage+):	
  By	
  focusing	
  on	
  a	
  smaller	
  set	
  of	
  URLs,	
  we	
  can	
  
                   provide	
  fast	
  discovery	
  without	
  need	
  for	
  huge	
  data	
  store.	
  	
  

                –  (Coverage+,	
  Storage+):	
  By	
  allowing	
  longer	
  discovery	
  Jme,	
  we	
  can	
  
                   enhance	
  the	
  coverage	
  without	
  using	
  a	
  large	
  data	
  store.	
  




2012/11/8	
         ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     89	
  
It	
  is	
  all	
  about	
  the	
  trade-­‐off	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     90	
  
Two	
  schools	
  of	
  thoughts	
  (1/2)	
         •  (Storage+,	
  Latency+)	
  
                –  ACacks	
  are	
  
                    •  Wave	
  in	
  nature	
  
                             –  Sudden	
  appearance	
  
                             –  Short	
  lifespan	
  	
  
                    •  Disposable	
  
                             –  Use	
  once	
  and	
  throw	
  away	
  
                    •  Regionalized	
  
                             –  Global	
  epidemic	
  are	
  less	
  common	
  
                    •  Few	
  
                             –  <	
  0.01%	
  of	
  the	
  daily	
  unique	
  URLs	
  are	
  malicious	
  




2012/11/8	
        ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     91	
  
Streaming	
  Example	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     92	
  
Two	
  schools	
  of	
  thoughts	
  (2/2)	
         •  (Coverage+,	
  Latency+)	
  
                –  History	
  repeats	
  itself	
  
                     •  So	
  does	
  hacker’s	
  infrastructure	
  (not	
  so	
  throwaway)	
  

                –  ProtecJng	
  coverage	
  is	
  essenJal	
  
                     •  Detectable	
  by	
  more	
  thorough	
  invesJgaJon	
  with	
  larger	
  context	
  


                –  Future-­‐Proof	
  
                     •  Our	
  soluJon	
  reflects	
  past	
  knowledge	
  
                     •  If	
  we	
  don’t	
  accumulate/adapt	
  /evolve	
  our	
  knowledge,	
  our	
  soluJon	
  will	
  be	
  
                        obsolete	
  




2012/11/8	
         ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     93	
  
Batch	
  Example	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     94	
  
It	
  Boils	
  Down	
  to	
  Streaming	
  vs.	
  Batch	
  Processing	
  
                 	
  	
  
        	
         •  Streaming	
  looks	
  at	
  queries	
  in	
  real-­‐Jme	
  
                –  Filters	
  out	
  unneeded	
  URLs	
  
                –  Processes	
  suspicious	
  URLs	
  only	
  
                –  Kava,	
  S4,	
  Trend	
  Messaging	
  Exchange	
  


         •  Batch	
  processing	
  
                –  Not	
  real-­‐Jme	
  	
  
                –  Broader	
  scope	
  
                –  Hadoop	
  Map-­‐Reduce	
  




2012/11/8	
        ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     95	
  
Streaming	
  Big	
  Data	

         •  URL	
  and	
  its	
  value	
  are	
  ephemeral	
  
                –  Need	
  to	
  act	
  fast	
  
                –  No	
  need	
  to	
  store	
  them	
  


         •  Useful	
  data	
  are	
  far	
  in	
  between	
  	
  
                –  Filter	
  it	
  out	
  


         •  Apply	
  Unix	
  Pipe	
  concept	
  distributed	
  style	
  
                –  Message	
  oriented	
  programming	
  




2012/11/8	
          ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     96	
  
What is Message Oriented
                Programming?	




2012/11/8	
       Confidential | Copyright 2012 Trend Micro Inc.   97	
  
TradiJonally	




                •  Tightly	
  Coupled	
  
                    –  Fixed	
  service	
  locaJon	
  
                    –  Protocol	
  specific	
  
                    –  Difficult	
  to	
  change/adapt	
  to	
  new	
  business	
  requirement	
  	
  


                •  Lack	
  of	
  separaJon	
  between	
  	
  
                    –  Network	
  handling	
  
                    –  ApplicaJon	
  logic	
  	
  




2012/11/8	
         Confidential | Copyright 2012 Trend Micro Inc.   98	
  
Mixing	
  network	
  and	
  applicaJon	
  logic	
                #include	
  <sys/types.h>	
  
                #include	
  <sys/socket.h>	
                                                                   •  Complex
                #include	
  <neJnet/in.h>	
                                                                    •  Time wasted	
                #include	
  <arpa/inet.h>	
  
                #include	
  <stdio.h>	
  
                #include	
  <stdlib.h>	
  
                #include	
  <string.h>	
  
                #include	
  <unistd.h>	
  
                	
  
                int	
  main(void){	
  
                	
  	
  	
  	
  struct	
  sockaddr_in	
  stSockAddr;	
  
                	
  	
  	
  	
  int	
  SocketFD	
  =	
  socket(PF_INET,	
  SOCK_STREAM,	
  IPPROTO_TCP);	
  
                	
  	
  	
  	
  memset(&stSockAddr,	
  0,	
  sizeof(stSockAddr));	
  
                	
  	
  	
  	
  stSockAddr.sin_family	
  =	
  AF_INET;	
  
                	
  	
  	
  	
  stSockAddr.sin_port	
  =	
  htons(5566);	
  
                	
  	
  	
  	
  stSockAddr.sin_addr.s_addr	
  =	
  INADDR_ANY;	
  
                	
  	
  	
  	
  bind(SocketFD,(const	
  void	
  *)&stSockAddr,	
  sizeof(stSockAddr))	
  
                	
  	
  	
  	
  listen(SocketFD,	
  10)	
  
                	
  	
  	
  	
  int	
  ConnectFD	
  =	
  accept(SocketFD,	
  NULL,	
  NULL);	
  
                	
  	
  	
  	
  //do	
  something	
  
                }	
                	


2012/11/8	
        Confidential | Copyright 2012 Trend Micro Inc.                                  99	
  
2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     10
                                                                                     0	
  
2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     10
                                                                                     1	
  
Is	
  that	
  enough?	

         •  Protocol	
  independence	
  
         •  LocaJon	
  independence	
  
                –  URL	
  vs.	
  Channel	
  ID.	
  
         •  Direct	
  vs.	
  Indirect	
  ConnecJon	
  
                –  Replacing	
  connecJon	
  to	
  server	
  with	
  connecJon	
  to	
  message	
  bus	
  




2012/11/8	
         ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     10
                                                                                         2	
  
Further	
  encapsulaJon	




                •  To	
  aCach	
  to	
  the	
  message	
  bus:	
  
                      Ø message-­‐source	
  |	
  your-­‐app-­‐here	
  |	
  message-­‐sink	
  
                      Ø Message-­‐source	
  |	
  app-­‐1-­‐here	
  |	
  app-­‐2-­‐here	
  |	
  message-­‐sink	
  


                •  Just	
  like	
  Unix	
  pipe	
  concept	
  
                      –  cat	
  log.txt	
  |	
  gawk	
  ‘{print	
  $1}’	
  |	
  sort	
  -­‐u	
  

2012/11/8	
       Confidential | Copyright 2012 Trend Micro Inc.                   10
                      	
                                                           3	
  
Messaging	
  code	
  is	
  as	
  simple	
  as	

         #include	
  <iostream>	
  
         #include	
  <string>	
  
         	
  
         int	
  main()	
  {	
  
         	
  	
  	
  	
  std::string	
  name;	
  
         	
  	
  	
  	
  std::cin	
  >>	
  name;	
  
         	
  	
  	
  	
  std::cout	
  <<	
  "Hello!	
  "	
  <<	
  name;	
  
         }	
  




2012/11/8	
        ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     10
                                                                                        4	
  
Conceptually	
  it	
  is	
  sJll	
  data	
  flow	
          •  Each	
  blue	
  arrow	
  is	
  now	
  a	
  
             message	
  channel	
  /	
  queue.	
  




          •  Each	
  component	
  can	
  	
  be	
  in	
  
             different	
  locaJon,	
  and	
  
             dynamically	
  rearranged	
  
             with	
  minimum	
  effort	
  




2012/11/8	
       Confidential | Copyright 2012 Trend Micro Inc.   10
                                                                   5	
  
Intra-­‐PC	
  vs.	
  Extra-­‐PC	
  messaging	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     10
                                                                                     6	
  
CoordinaJng	
  tools	
  (1/2)	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     10
                                                                                     7	
  
Coordinator	
  (2/2)	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     10
                                                                                     8	
  
It	
  is	
  not	
  a	
  pipe	
  dream	




2012/11/8	
     Confidential | Copyright 2012 Trend Micro Inc.   10
                                                                 9	
  
Scalability	

         •  Wait	
  we	
  are	
  dealing	
  with	
  Big	
  Data	
  here!	
  




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     11
                                                                                     0	
  
Scalability	

         •  Message	
  bus	
  becomes	
  the	
  boCleneck	
  
                –  Each	
  blue	
  arrow	
  represents	
  input/output	
  to	
  the	
  message	
  bus	
  




2012/11/8	
        ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     11
                                                                                        1	
  
ParJJoning	
  Message	
  Bus	
  (1/2)	
           •  ParJJon	
  
                –     Spread	
  out	
  channels	
  across	
  different	
  message	
  servers	
  
                –     Load	
  balance	
  
                –     Avoid	
  network	
  boCleneck	
  	
  
                –     Increase	
  number	
  of	
  channels	
  system	
  can	
  handle	
  
           •  Because	
  messaging	
  encapsulaJon	
  
                –  Server	
  selecJon	
  and	
  load	
  balance	
  are	
  automaJc.	
  	
  




2012/11/8	
          Confidential | Copyright 2012 Trend Micro Inc.   11
                                                                      2	
  
ParJJoning	
  Message	
  Bus	
  (2/2)	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     11
                                                                                     3	
  
Message Oriented Programming Tips	




2012/11/8	
       Confidential | Copyright 2012 Trend Micro Inc.   11
                                                                   4	
  
Parallel	
  Upgrade	
  (1/2)	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     11
                                                                                     5	
  
Parallel	
  Upgrade	
  (2/2)	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     11
                                                                                     6	
  
Sharing	
  Context	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     11
                                                                                     7	
  
How	
  WRS	
  does	
  it?	




2012/11/8	
     ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     11
                                                                                     8	
  
Big	
  Data	
  Tools	

         •  In	
  House	
  SoluJon	
  
                –  Trend	
  Messaging	
  Exchange	
  	
  
                    •  Coordinate	
  and	
  distribute	
  works	
  among	
  large	
  set	
  of	
  machines	
  
                    •  Enhanced	
  scalability	
  &	
  reliability	
  	
  
                    •  Open	
  Sourced:	
  hCps://github.com/trendmicro/tme/wiki	
  
                –  Lumber	
  Jack–	
  Ultra	
  High	
  Efficiency	
  indexing	
  system	
  
                    •  Structures	
  log	
  allowing	
  for	
  <	
  10	
  seconds	
  retrieval	
  of	
  vital	
  staJsJcs	
  and	
  
                       informaJon	
  
                             –  TradiJonal	
  scanning	
  method	
  requires	
  >	
  10	
  minutes	
  to	
  days	
  
                             –  60	
  Jmes	
  savings	
  in	
  Jme	
  
                    •  Highly	
  specialized	
  for	
  Trend’s	
  tasks	
  

         •  Community	
  Supported	
  Projects	
  
                –  Trend	
  Customized	
  Hadoop/Hbase	
  data	
  storage	
  
                    •  Involves	
  with	
  Hbase	
  steering	
  commiCee	
  	
  
                             –  Contribute	
  to	
  the	
  open	
  sourced	
  community	
  

2012/11/8	
        ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     11
                                                                                        9	
  
Big	
  Data	
  Begets	
  Big	
  data—	
  
        	
  	
  	
  aka	
  Business	
  Intelligence	
         •  We	
  have	
  built	
  a	
  large	
  infrastructure	
  processing	
  big	
  data.	
  
                –  Big	
  data	
  generates	
  big	
  data	
  generates	
  business	
  intelligence	
  	
  
                –  For	
  example:	
  8	
  billion	
  URLs	
  flowing	
  through	
  the	
  system	
  
                    •  8	
  billion	
  flowing	
  through	
  100	
  nodes	
  will	
  generate	
  800	
  billion	
  entries	
  in	
  log	
  
                       (conservaJvely	
  esJmaJng)	
  
                    •  Business	
  intelligence	
  extracJon	
  




2012/11/8	
        ConfidenJal	
  |	
  Copyright	
  2012	
  Trend	
  Micro	
  Inc.	
     12
                                                                                        0	
  
Discussion
Scale-­‐up	
  vs.	
  Scale-­‐out	
  




http://natishalom.typepad.com/.a/6a00d835457b7453ef01348697aa8a970c-pi
SQL	
  vs.	
  noSQL	
  




http://community.sageaccpac.com/blogs/r_and_d/archive/2012/01/28/nosql-for-erp.aspx
Public	
  Cloud	
  vs.	
  Private	
  Cloud	
  
用五個刪去法重新認識雲端運算	
  

    •  雲端不是一個地方	
  
    •  雲端不等於伺服器虛擬化	
  
    •  雲端不是孤島運作	
  
    •  雲端不是由上而下的發展	
  
    •  雲端不是說說而已	
  




http://www.bnext.com.tw/focus/view/cid/103/id/23682

More Related Content

Viewers also liked

Tetuan Valley Startup School - Guest mentor Byron Stanford
Tetuan Valley Startup School - Guest mentor Byron StanfordTetuan Valley Startup School - Guest mentor Byron Stanford
Tetuan Valley Startup School - Guest mentor Byron StanfordLuis Rivera
 
Dance & Technology - Move It 2014
Dance & Technology - Move It 2014Dance & Technology - Move It 2014
Dance & Technology - Move It 2014Natasha Reynolds
 
Lenin (reBranding) by MOST Creative Club
Lenin (reBranding) by MOST Creative ClubLenin (reBranding) by MOST Creative Club
Lenin (reBranding) by MOST Creative ClubMOST Creative Practice
 
Srm ii presentation - Bob Hill
Srm ii presentation - Bob HillSrm ii presentation - Bob Hill
Srm ii presentation - Bob Hillbhill333
 

Viewers also liked (11)

Tetuan Valley Startup School - Guest mentor Byron Stanford
Tetuan Valley Startup School - Guest mentor Byron StanfordTetuan Valley Startup School - Guest mentor Byron Stanford
Tetuan Valley Startup School - Guest mentor Byron Stanford
 
APG
APGAPG
APG
 
Facebook
FacebookFacebook
Facebook
 
Dance & Technology - Move It 2014
Dance & Technology - Move It 2014Dance & Technology - Move It 2014
Dance & Technology - Move It 2014
 
Going mobile
Going mobileGoing mobile
Going mobile
 
Lenin (reBranding) by MOST Creative Club
Lenin (reBranding) by MOST Creative ClubLenin (reBranding) by MOST Creative Club
Lenin (reBranding) by MOST Creative Club
 
All aboutmango
All aboutmangoAll aboutmango
All aboutmango
 
Srm ii presentation - Bob Hill
Srm ii presentation - Bob HillSrm ii presentation - Bob Hill
Srm ii presentation - Bob Hill
 
New impact talk
New impact talkNew impact talk
New impact talk
 
Asamblea 18 04-2013
Asamblea 18 04-2013Asamblea 18 04-2013
Asamblea 18 04-2013
 
Jin2
Jin2Jin2
Jin2
 

Similar to Cloud Practices and Big Data Trends at Trend Micro

Monetizing The Enterprise: Borderless Networks
Monetizing The Enterprise: Borderless NetworksMonetizing The Enterprise: Borderless Networks
Monetizing The Enterprise: Borderless NetworksCisco Service Provider
 
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...NetworkCollaborators
 
Building The Right Network
Building The Right NetworkBuilding The Right Network
Building The Right NetworkCisco Canada
 
Service Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidService Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidMyNOG
 
Using Security to Build with Confidence in AWS - Trend Micro
Using Security to Build with Confidence in AWS - Trend Micro Using Security to Build with Confidence in AWS - Trend Micro
Using Security to Build with Confidence in AWS - Trend Micro Amazon Web Services
 
apidays LIVE New York 2021 - Supercharge microservices with Service Mesh by S...
apidays LIVE New York 2021 - Supercharge microservices with Service Mesh by S...apidays LIVE New York 2021 - Supercharge microservices with Service Mesh by S...
apidays LIVE New York 2021 - Supercharge microservices with Service Mesh by S...apidays
 
Istio Service Mesh
Istio Service MeshIstio Service Mesh
Istio Service MeshLew Tucker
 
Cisco Connect Ottawa 2018 consuming public and private clouds
Cisco Connect Ottawa 2018 consuming public and private cloudsCisco Connect Ottawa 2018 consuming public and private clouds
Cisco Connect Ottawa 2018 consuming public and private cloudsCisco Canada
 
#VMUGMTL - Radware Breakout
#VMUGMTL - Radware Breakout#VMUGMTL - Radware Breakout
#VMUGMTL - Radware Breakout1CloudRoad.com
 
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023VMware Tanzu
 
Cisco Connect Ottawa 2018 the intelligent network with Cisco Meraki
Cisco Connect Ottawa 2018 the intelligent network with Cisco MerakiCisco Connect Ottawa 2018 the intelligent network with Cisco Meraki
Cisco Connect Ottawa 2018 the intelligent network with Cisco MerakiCisco Canada
 
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...NetworkCollaborators
 
VMworld 2013: Get on with Business - VMware Reference Architectures Help Stre...
VMworld 2013: Get on with Business - VMware Reference Architectures Help Stre...VMworld 2013: Get on with Business - VMware Reference Architectures Help Stre...
VMworld 2013: Get on with Business - VMware Reference Architectures Help Stre...VMworld
 
Harbour IT & VMware - vForum 2010 Wrap
Harbour IT & VMware - vForum 2010 WrapHarbour IT & VMware - vForum 2010 Wrap
Harbour IT & VMware - vForum 2010 WrapHarbourIT
 
The Impact of Digital Transformation on Enterprise Security
The Impact of Digital Transformation on Enterprise SecurityThe Impact of Digital Transformation on Enterprise Security
The Impact of Digital Transformation on Enterprise SecurityDevOps.com
 
Cisco Connect Toronto 2018 the intelligent network with cisco meraki
Cisco Connect Toronto 2018   the intelligent network with cisco merakiCisco Connect Toronto 2018   the intelligent network with cisco meraki
Cisco Connect Toronto 2018 the intelligent network with cisco merakiCisco Canada
 
2021 01-27 reducing risk of ransomware webinar
2021 01-27 reducing risk of ransomware webinar2021 01-27 reducing risk of ransomware webinar
2021 01-27 reducing risk of ransomware webinarAlgoSec
 
Data Driven Decisions in DevOps
Data Driven Decisions in DevOpsData Driven Decisions in DevOps
Data Driven Decisions in DevOpsLeon Stigter
 
Simplifying the secure data center
Simplifying the secure data centerSimplifying the secure data center
Simplifying the secure data centerCisco Canada
 

Similar to Cloud Practices and Big Data Trends at Trend Micro (20)

Soa And Web Services Security
Soa And Web Services SecuritySoa And Web Services Security
Soa And Web Services Security
 
Monetizing The Enterprise: Borderless Networks
Monetizing The Enterprise: Borderless NetworksMonetizing The Enterprise: Borderless Networks
Monetizing The Enterprise: Borderless Networks
 
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
 
Building The Right Network
Building The Right NetworkBuilding The Right Network
Building The Right Network
 
Service Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidService Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay Kid
 
Using Security to Build with Confidence in AWS - Trend Micro
Using Security to Build with Confidence in AWS - Trend Micro Using Security to Build with Confidence in AWS - Trend Micro
Using Security to Build with Confidence in AWS - Trend Micro
 
apidays LIVE New York 2021 - Supercharge microservices with Service Mesh by S...
apidays LIVE New York 2021 - Supercharge microservices with Service Mesh by S...apidays LIVE New York 2021 - Supercharge microservices with Service Mesh by S...
apidays LIVE New York 2021 - Supercharge microservices with Service Mesh by S...
 
Istio Service Mesh
Istio Service MeshIstio Service Mesh
Istio Service Mesh
 
Cisco Connect Ottawa 2018 consuming public and private clouds
Cisco Connect Ottawa 2018 consuming public and private cloudsCisco Connect Ottawa 2018 consuming public and private clouds
Cisco Connect Ottawa 2018 consuming public and private clouds
 
#VMUGMTL - Radware Breakout
#VMUGMTL - Radware Breakout#VMUGMTL - Radware Breakout
#VMUGMTL - Radware Breakout
 
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
 
Cisco Connect Ottawa 2018 the intelligent network with Cisco Meraki
Cisco Connect Ottawa 2018 the intelligent network with Cisco MerakiCisco Connect Ottawa 2018 the intelligent network with Cisco Meraki
Cisco Connect Ottawa 2018 the intelligent network with Cisco Meraki
 
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
 
VMworld 2013: Get on with Business - VMware Reference Architectures Help Stre...
VMworld 2013: Get on with Business - VMware Reference Architectures Help Stre...VMworld 2013: Get on with Business - VMware Reference Architectures Help Stre...
VMworld 2013: Get on with Business - VMware Reference Architectures Help Stre...
 
Harbour IT & VMware - vForum 2010 Wrap
Harbour IT & VMware - vForum 2010 WrapHarbour IT & VMware - vForum 2010 Wrap
Harbour IT & VMware - vForum 2010 Wrap
 
The Impact of Digital Transformation on Enterprise Security
The Impact of Digital Transformation on Enterprise SecurityThe Impact of Digital Transformation on Enterprise Security
The Impact of Digital Transformation on Enterprise Security
 
Cisco Connect Toronto 2018 the intelligent network with cisco meraki
Cisco Connect Toronto 2018   the intelligent network with cisco merakiCisco Connect Toronto 2018   the intelligent network with cisco meraki
Cisco Connect Toronto 2018 the intelligent network with cisco meraki
 
2021 01-27 reducing risk of ransomware webinar
2021 01-27 reducing risk of ransomware webinar2021 01-27 reducing risk of ransomware webinar
2021 01-27 reducing risk of ransomware webinar
 
Data Driven Decisions in DevOps
Data Driven Decisions in DevOpsData Driven Decisions in DevOps
Data Driven Decisions in DevOps
 
Simplifying the secure data center
Simplifying the secure data centerSimplifying the secure data center
Simplifying the secure data center
 

More from netdbncku

Jenkins hand in hand
Jenkins  hand in handJenkins  hand in hand
Jenkins hand in handnetdbncku
 
Continuous integration
Continuous integrationContinuous integration
Continuous integrationnetdbncku
 
20121213 qa introduction smileryang
20121213 qa introduction smileryang20121213 qa introduction smileryang
20121213 qa introduction smileryangnetdbncku
 
20121213 foundation of software development 2 2-ktchuang
20121213 foundation of software development 2 2-ktchuang20121213 foundation of software development 2 2-ktchuang
20121213 foundation of software development 2 2-ktchuangnetdbncku
 
Software development lifecycle_release_management
Software development lifecycle_release_managementSoftware development lifecycle_release_management
Software development lifecycle_release_managementnetdbncku
 
Intoduction of programming contest
Intoduction of programming contestIntoduction of programming contest
Intoduction of programming contestnetdbncku
 
Foundation of software development 2
Foundation of software development 2Foundation of software development 2
Foundation of software development 2netdbncku
 
Tutorial of eclipse
Tutorial of eclipseTutorial of eclipse
Tutorial of eclipsenetdbncku
 
Foundation of software development 1
Foundation of software development 1Foundation of software development 1
Foundation of software development 1netdbncku
 
3. java basics
3. java basics3. java basics
3. java basicsnetdbncku
 
2. java introduction
2. java introduction2. java introduction
2. java introductionnetdbncku
 

More from netdbncku (11)

Jenkins hand in hand
Jenkins  hand in handJenkins  hand in hand
Jenkins hand in hand
 
Continuous integration
Continuous integrationContinuous integration
Continuous integration
 
20121213 qa introduction smileryang
20121213 qa introduction smileryang20121213 qa introduction smileryang
20121213 qa introduction smileryang
 
20121213 foundation of software development 2 2-ktchuang
20121213 foundation of software development 2 2-ktchuang20121213 foundation of software development 2 2-ktchuang
20121213 foundation of software development 2 2-ktchuang
 
Software development lifecycle_release_management
Software development lifecycle_release_managementSoftware development lifecycle_release_management
Software development lifecycle_release_management
 
Intoduction of programming contest
Intoduction of programming contestIntoduction of programming contest
Intoduction of programming contest
 
Foundation of software development 2
Foundation of software development 2Foundation of software development 2
Foundation of software development 2
 
Tutorial of eclipse
Tutorial of eclipseTutorial of eclipse
Tutorial of eclipse
 
Foundation of software development 1
Foundation of software development 1Foundation of software development 1
Foundation of software development 1
 
3. java basics
3. java basics3. java basics
3. java basics
 
2. java introduction
2. java introduction2. java introduction
2. java introduction
 

Cloud Practices and Big Data Trends at Trend Micro

  • 1. Cloud Practices in Trend Micro Chung-Tsai Su, Ray Liao Core Tech Trend Micro Inc. 2012/11/06
  • 2. Agenda •  What  is  happening  in  Real  World   •  Big  Data  in  Trend  Micro   •  Experience  Sharing   –  Galileo15   –  WRS   •  Discussions   •  Q&A  
  • 3. What’s happening in the real Real World
  • 5.
  • 8. SPAM  aCacked  to  our  CEO   Eva Chen CEO & Co-Founder Trend Micro
  • 9. SPAM  aCacked  to  me   http://rebelrowsers.com/AV8z8s/index.html http://videospornogratis.com.es/DGx9Zv/index.html http://newarkpartytents.com/BvFNK66F/index.html
  • 10.
  • 12.
  • 14.
  • 15.
  • 16. Big Data in Trend Micro
  • 17. Smart  ProtecJon  Network  (SPN)   Date: 2012/09/25
  • 18. New  Approach  for  Cyber  Threat  SoluJon   CDN  /  xSP   Researcher   Intelligence   Honeypot   Web  Crawler   Trend  Micro   Mail  ProtecJon   Trend  Micro   Trend  Micro     Endpoint  ProtecJon   Web  ProtecJon   300+  Million  Worldwide  Sensors  
  • 19. SPN Solution Architecture Processing   Validate  &   Quality   Solu<on     Solu<on     Sourcing &  Analysis Create  Solu<on Assurance Distribu<on Adop<on File File Reputation Service Web / URL Smart Protection Customer Email Web Reputation Service Domain Email Reputation Service IP SPN Correlation Community Intelligence (Feedback loop)
  • 20. Challenges  We  Are  Faced   6TB  of  data  and  15B  lines  of  logs  received  daily  by     It  becomes  the  Big  Data  Challenge!  
  • 21. Overview  –  Smart  Feedback  Data  Source Akamai (*): URL users accessed Access NSC_TmProxy_URLF_002: APP accessed malicious URL Exposure   NSC_TmProxy_HFS_001: URL hosted suspicious/malicious file Layer Content AMSP_TMBP_NSC_001: URL hosted shellcode BES_001: URL hosted suspicious/malicious content SAL_001: URL hosted suspicious/malicious content TMASE_001: Email contains suspicious/malicious content Infec<on  Layer VSAPI_001: File detected as suspicious/malicious CENSUS_001: File executed on endpoint AEGIS_001: APP with suspicious/malicious behavior RCA_001: Endpoint infection chain Dynamic  Layer CONAN_001: File detected by heuristic rules CONAN_002: Heuristic rule detection result of a file DCE_001: Clean result DRE_001, PEDif_001, LCE_001
  • 22. Feedback  Source  in  Terms  of  Products Gateway Consumer SMB Enterprise Products Schema ID Endpoint Endpoint Endpoint (IMSS/ (Titanium/TIS) (WFBS) (OSCE) IWSS) Akamai V V V V NSC_TmProxy_URLF_002 V V V NSC_TmProxy_HFS_001 V (*) V V AMSP_TMBP_NSC_001 V (*) BES_001 V SAL_001 V TMASE_001 V VSAPI_001 V V V CENSUS_001 V AEGIS_001 V V V RCA_001 V V (*) CONAN_001 V CONAN_002 V DCE_001 V V V DRE_001 V PEDif_001 V LCE_001 V
  • 23. Feedback  Volumes NSC_TmProxy_URLF_002 AEGIS_001 VSAPI_001 CONAN_002 TMASE_001 DCE_001 CENSUS_001 CONAN_001 RCA_001 LCE_001 SAL_001 NSC_TmProxy_HFS_001 DRE_001 BES_001 AMSP_TMBP_NSC_001 PEDif_001 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000
  • 24. Feedback  StaJsJcs Unique GUID  ﹪ Housecall 275,934 1.64% Consumer 12,801,624 76.09% Enterprise 3,747,248 22.27%   16,824,806  
  • 25. Unique  Endpoints  Count  by  Products
  • 26. Unique  Endpoint  Counts  by  Industry     (industry  category  feedback  only  from  Enterprise  products) Notspecified 1,564,471 47.43% Specified 1,734,141 52.57%
  • 27. SPN  High  Level  Architecture API  Server/Portal  (SSO)   SPN   Honey   CDN/xSP   Feedback   Pot   Log   Data  Sourcing   Service  Pla]orm   MySPN  PlaSorm   Log  Receiver   Solr  Cloud   Log  Post-­‐processing   Web   Pages Hadoop  Distributed  File  System    (HDFS)   CorrelaJon   Threat   Census   DRR   Pla]orm   Connect   Tracking   Global   Akame   Logging   Object   System   Cache   Adhoc-­‐Query  (Pig)   MapReduce   Oozie   HBase   Trend  Message  Exchange  (Message  Bus)   Email  ReputaJon   3rd-­‐Party     File  ReputaJon     Web  ReputaJon     Service   Data  Feed   Service   Service  
  • 28. Service  Stack  of  SPN   SAL/MKT TS RD Consumer Enterprise Internal Customer External Customer Threat Landscape Risk Management User Experience Service Catalogue MagicQ ZDASE Census APT Report Widget Global Intelligence Network Entity Web Mobile Correlation Cloud Infra Infrastructure Data Akamai Zone Files FRS WRS Catalogue Census Feedback ERS SPN Cooked Data Raw Data Feeds Feeds
  • 29. SPN Ecosystem API OLAP System MySPN Framework Web Frontend Data Mining Solr Sourcing RDB Adhoc-Query TME Oozie Scribe Pig Hive Arvo Protobuf Flume Streaming MapReduce Engine Hadoop HCatalog Data Inputs Data Outputs HDFS HBase OLTP System Middleware / DB / K-V Stores Web Frontend
  • 30.
  • 32. Chung-­‐Tsai  Su,  Spark  Tsao,     Wynne  Chu,  and  Ray  Liao  
  • 33. 20 years ago, a young man said: t L et’s figh ad guys b
  • 34. At that time, bad guys appear 1 by 1 1 on 1 34
  • 35. In recent years, bad guys mutate themselves … 1 on 10 35
  • 36. Nowadays, bad guys adopt community attack … SuPaWind We need a community-based solution 36
  • 39. Huge amount of mass data Number Size Source Category (thousands/day) (MB/day) SPN Feedback log 17,000 11,000 WRS Web log Web  log 4,500,000 4,500,000 1,500,000 1,500,000 GPServer Crawl 300,000 N/A FRS File sourcing 30 14,000 Honeypot 78,000 200,000 ERS IP level Queries 1,200,000 46,000 39 39
  • 40. What is the best data structure to describe “Community”? Hash Clique Tree Sequence Clique 40
  • 41. Clique Botnet facevook.com facebouk.com Fast-flux Phishing facenook.com 41
  • 43. Galileo15 Makes it Possible!!! • 2  observa<ons  from  the  data   –  Sparse  connec<on  with  low  diameter  preference   –  Incomplete  connec<on     Domain IP 66.135.202.89 fahrzeugteile.shop.ebay.de 66.135.205.141 shop.ebay.ca 66.135.213.211 66.135.213.215 videogames.shop.ebay.com.au 66.211.160.11 66.211.180.27 Missing edges 43 43
  • 44. Galileo15   Transform  mass  raw  data  into  community  structures Host Host IP Domain 66.135.202.89 fahrzeugteile.shop.ebay.de 66.135.205.141 shop.ebay.ca 66.135.213.211 66.135.213.215 videogames.shop.ebay.com.au 66.211.160.11 66.211.180.27 44
  • 46. Architecture of Galileo15 Clique Clique Clique Enumeration Matching Ranking 46
  • 47. Architecture of Galileo15 Clique   Clique   Clique Matching    Enumeration   Ranking 47
  • 48. Architecture of Galileo15 Clique Clique   Clique Matching   Ranking Enumeration 48
  • 49. Architecture of Galileo15 Clique Clique Clique Enumeration Ranking Matching Time0 Time1 49
  • 50. Architecture of Galileo15 Clique Clique Clique Enumeration Ranking Matching Time0 Time1 50
  • 51. Architecture of Galileo15 Clique Clique Clique Enumeration Matching Ranking •  Static Rank Time0 Time1 •  Dynamic Rank 1 2 51
  • 52. Architecture of Galileo15 Clique Clique Clique Enumeration Matching Ranking •  Static Rank Time0 Time1 •  Dynamic Rank 1 2 52
  • 53. Clique Enumeration Reduces workload 1 hour Hadoop Cliques Web browsing log 180 million logs 700,000 cliques 7 reducers < 5 minutes Runs at WRS ALPS Env. 40.3% 40 Machines 53
  • 54. Clique Matching Saves computational consumption Run time > 1 day 90.8% 20 mins 15 mins < 2 mins Brute force Hash-based Multi-layer Hash-based 54
  • 56. Why  “Community    that  Fits”?     Domain IP Server420.at.youporn.com 87.248.207.141 56
  • 57. Why  “Community    that  Fits”?     Domain IP 203.77.186.249 69.164.22.140 Server114.at.youporn.com 69.164.22.153 69.164.22.154 Server346.at.youporn.com 87.248.203.50 Server420.at.youporn.com 87.248.207.141 87.248.210.147 Server730.at.youporn.com 87.248.211.194 Server923.at.youporn.com 87.248.211.223 87.248.212.55 87.248.218.132 57
  • 58. Why  “Community    that  Fits”?     Domain IP 203.77.186.249 69.164.22.140 Server114.at.youporn.com 69.164.22.153 WTP 69.164.22.154 Server346.at.youporn.com 87.248.203.50 DUL from ERS Server420.at.youporn.com 87.248.207.141 87.248.210.147 Server730.at.youporn.com Malicious 87.248.211.194 Server923.at.youporn.com 87.248.211.223 Phishing 87.248.212.55 87.248.218.132 58
  • 59. Why  “Community    that  Fits”?     Domain IP 203.77.186.249 69.164.22.140 Server114.at.youporn.com 69.164.22.153 WTP 69.164.22.154 Server346.at.youporn.com 87.248.203.50 DUL from ERS Server420.at.youporn.com 87.248.207.141 87.248.210.147 Server730.at.youporn.com Malicious 87.248.211.194 Server923.at.youporn.com 87.248.211.223 Phishing 87.248.212.55 87.248.218.132 59
  • 60. Some porn websites are not blocked but caught by Galileo15 amateurmaturevoyeur.pornblink.com bareasswhipping.pornblink.com WTP desihotpoint.com freexxxamaturefucking.pornblink.com Phishing fxxkinsilly.com goldengatebridgebuilt.pornblink.com 203.77.186.249 hotolderwomenshowingpants.pornblink.com matureamateurgallerysoftcore.pornblink.com Malicious skinnyteenanallesbian.pornblink.com spermster.com Pornography Pornography 60
  • 61. Applications   Clique Enumeration Clique Matching Clique Ranking Domain IP Domain IP Domain IP Domain IP Domain IP #Cliques Time T0 T0+15 T0+30 T0+45 T0+60 61
  • 62. Applications   Clique Enumeration Clique Matching Clique Ranking Domain IP Domain IP Domain IP Domain IP Domain IP #Cliques WhiteListing Anomaly detection Web Hosting FastFlux Time T0 T0+15 T0+30 T0+45 T0+60 62
  • 63. Applications   Domain IP Domain IP Domain IP Domain IP Domain IP #Cliques 1 Whitelisting 2 Anomaly detection 3 Web Hosting 4 Fast Flux Time T0 T0+15 T0+30 T0+45 T0+60 63
  • 64. More? 64
  • 65. History Evolution Clique Sequence Clustering Classification 1980 1990 2000 2010 65
  • 66. Summary •  Propose a brand-new community representation •  Provide a powerful graph-based correlation engine •  Reduce 40.3% workload •  Bring huge business value 66
  • 67. Q&A 67
  • 68. ALGORITHM 68
  • 69. Clique Enumeration algorithm (1/2) input map reduce output Domain1, IP1 Domain2, IP2 map Shuffling by key Domain3, IP3 Domain1 ; IP1,1 , IP1,2, … reduc Domain2 ; IP2,1 , IP2,2, … e Domain3 ; IP3,1 , IP3,2, … map Domaini ; IPi,1 , IPi,2 , … reduc Domaini+1; IPi+1,1, IPi+1,2, … e Domaini+2; IPi+2,1, IPi+2,2, … map Domainj ; IPj,1 , IPj,2 , … reduc Domainj+1; IPj+1,1, IPj+1,2, … e Domainj+2; IPj+2,1, IPj+2,2, … Sorting by key map Domainn, IPn 69
  • 70. Clique Enumeration algorithm (2/2) input map reduce output Domain1 ; IP1,1 , IP1,2, … Domain2 ; IP2,1 , IP2,2, … Shuffling by key map Domain3 ; IP3,1 , IP3,2, … reduce Domaini ; IPi,1 , IPi,2 , … Domaini+1; IPi+1,1, IPi+1,2, … map Domaini+2; IPi+2,1, IPi+2,2, … reduce Domainj ; IPj,1 , IPj,2 , … Domainj+1; IPj+1,1, IPj+1,2, … Sorting by key map Domainj+2; IPj+2,1, IPj+2,2, … 70
  • 71. Parameters of Clique Enumeration algorithm L R §  γ : density of edges of Quasi-Clique l1 ú  |E| ≥ γ |L| |R| l2 r1 r2 §  MinE: Minimum support of each edge l3 r3 ú  #E(li,rj) ≥ MinE l4 §  MinL, MaxL : Minimal and maximal number of G(L,R,E) objects on the left site of a clique L = { l 1, l 2, l 3, l 4} ú  MinL ≥ |L| ≥ MaxL R = {r1, r2, r3} §  MinR, MaxR : Minimum and maximum number of E = {(li, rj)| objects on the right site of a clique 1≦i≦4,1≦j≦3} |L| = 4, |R| = 3    MinR ≥ |R| ≥ MaxR Deg(l1) = 2, Deg(l2) = 3 §  Min_DegL, Min_DegR: Minimum degrees of objects on the left and right site of a clique, respectively ú  Deg(li) ≥ Min_DegL ∀li ∈ L; Deg(rj) ≥ Min_DegR ∀rj ∈ R 71
  • 72. SpecificaJon  for  Hadoop  Environment   Number of Machines 40 Machine Type Dell PE2950 CPU QuadCore Xeon 5410 x 2 RAM 4GB (667MHz) x 2 Disk 300 GB SATA 7.2K x 6 OS RHEL AS4, 32 bits 72  
  • 74. Time consumption on Clique Enumeration algorithm (1/4) Time(Sec.) #Reducers 74  
  • 75. Time  consumpJon  on     Clique  EnumeraJon  algorithm  (2/4) Number of Reducers 1st mapper 1st reducer 2nd mapper 2nd reducer Total time 1 27 1201 52 97 1377 2 27 556 27 54 664 3 27 357 18 39 441 4 27 306 15 33 381 5 27 249 12 30 318 6 27 225 12 27 291 7 27 195 9 24 255 8 27 193 9 23 252 9 27 178 9 22 236 10 27 165 9 21 222 75  
  • 76. Time consumption on Clique Enumeration algorithm (3/4) 4,000,000,000 6000 #l ogs 3,500,000,000 T i me 5000 3,000,000,000 Number of logs Time (sec.) 4000 2,500,000,000 2,000,000,000 3000 1,500,000,000 2000 1,000,000,000 1000 500,000,000 0 0 1 2 4 8 16 76 Hours 76  
  • 77. Time consumption on Clique Enumeration algorithm (4/4) 1st 1st 2nd 2nd Total Hours #records #cliques Mappers map reduce map reduce time 1 182,642,849 730,651 416 27 195 9 24 293 2 375,836,783 1,008,351 849 27 300 15 33 505 4 763,789,635 1,323,948 1717 27 739 24 57 990 8 1,556,210,147 1,834,466 3466 27 1810 36 84 2270 16 3,773,804,326 2,518,523 8280 27 4203 69 188 5304 77  
  • 78. Multi-Layer Hash-based Matching Algorithm T0 T1 Hash table 1 l1 r1 2 Size=1 l1 r1 l2 3 r2 4 l3 r3 l4 5 Size=2~5 l1 6 l2 r1 r2 Size>6 7 l3 r3 l4 r4 l5 l6 r5 78
  • 79. Community Matching Algorithm (1/2) Algorithm Time  (sec.)   #Clique-­‐pairs   Brute  force   129,630   483,919   Hash-­‐based   1,194   483,919   Multi-­‐layer  hash-­‐based   110   424,213   Time0 Time1 90.8% 79
  • 80. Community  Matching  Algorithm  (2/2) 1000000 Hash-based Multi-layer hash-based Number of Clique-pairs 100000 10000 1000 100 10 1 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 80 Similarity
  • 82. Web  ReputaJon  System 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   82  
  • 83. Big  World,  Big  Data   •  Important  numbers  for  WRS   –  8  billions  queries  daily   –  9  hundred  millions  URLs  analyzed  daily   –  <  0.01%  daily  URLs  idenJfied  as  malicious   •  Finding  needle  in  the  haystack   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   83  
  • 84. Processing  Big  Data 0.1 ms per URL •  Content  analysis:  900  million  unique  URLs  /  24  hr  =  10K  URLs  per  second     –  Challenge:  How  to  coordinate,  maintain  and  distribute  work  among  large  set  of  machines   (>  500  machines)  ?   •  Raw  log  analysis:    3  Terabytes  of  data  each  day   –  Challenge:  How  to  store  them  in  a  way  that  is  reliable  +  fast  to  retrieve  relevant  data?   –  How  to  process  log  (present  +  historical  ~  500TB)  to  provide      vital  staJsJcs  and  trend?   Historical Trend Vital Present 3 Terabytes Statistics View per day Raw Log Anomaly 8 billions URLs per day Detection 19K URLs per day User Queries Malicious 900 million URLs per day URLs Unique Content URLs Analysis 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   84  
  • 85. Today’s  Agenda •  Discussion  of  the  real-­‐world  design   –  Constraints     –  Requirements   •  Sample  of  tools  available   –  When  to  use  them?   –  How?     2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   85  
  • 86. What  are  we  trying  to  do  with  Big  Data? 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   86  
  • 87. Usage  Triangle •  Historical domain – IP relations •  Historical access pattern •  Known malicious actors … •  Detect s abnormal behavior •  Groups malicious domains •  Potential malicious URLs … •  Malicious Activities? 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   87  
  • 88. Constraints  Triangle •  What data to store? •  How much data to store? •  For How long? •  Readily accessible •  $$$ •  Threat coverage •  How fast discovery can be? 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   88  
  • 89. CLS  ObservaJon •  Like  CAP  theorem  where  one  can  only  saJsfy  2  out  of  3   constraints,  one  can  only  saJsfy  2  out  of  3  constraints  when   working  on  threat  discovery.     –  (Coverage+,  Latency+):    It  is  impossible  to  achieve  fast  discovery  &   large  coverage  without  an  enormous  data  store  to  provide  the   necessary  informaJon  for  decision  making.     –  (Latency+,  Storage+):  By  focusing  on  a  smaller  set  of  URLs,  we  can   provide  fast  discovery  without  need  for  huge  data  store.     –  (Coverage+,  Storage+):  By  allowing  longer  discovery  Jme,  we  can   enhance  the  coverage  without  using  a  large  data  store.   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   89  
  • 90. It  is  all  about  the  trade-­‐off 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   90  
  • 91. Two  schools  of  thoughts  (1/2) •  (Storage+,  Latency+)   –  ACacks  are   •  Wave  in  nature   –  Sudden  appearance   –  Short  lifespan     •  Disposable   –  Use  once  and  throw  away   •  Regionalized   –  Global  epidemic  are  less  common   •  Few   –  <  0.01%  of  the  daily  unique  URLs  are  malicious   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   91  
  • 92. Streaming  Example 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   92  
  • 93. Two  schools  of  thoughts  (2/2) •  (Coverage+,  Latency+)   –  History  repeats  itself   •  So  does  hacker’s  infrastructure  (not  so  throwaway)   –  ProtecJng  coverage  is  essenJal   •  Detectable  by  more  thorough  invesJgaJon  with  larger  context   –  Future-­‐Proof   •  Our  soluJon  reflects  past  knowledge   •  If  we  don’t  accumulate/adapt  /evolve  our  knowledge,  our  soluJon  will  be   obsolete   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   93  
  • 94. Batch  Example 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   94  
  • 95. It  Boils  Down  to  Streaming  vs.  Batch  Processing       •  Streaming  looks  at  queries  in  real-­‐Jme   –  Filters  out  unneeded  URLs   –  Processes  suspicious  URLs  only   –  Kava,  S4,  Trend  Messaging  Exchange   •  Batch  processing   –  Not  real-­‐Jme     –  Broader  scope   –  Hadoop  Map-­‐Reduce   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   95  
  • 96. Streaming  Big  Data •  URL  and  its  value  are  ephemeral   –  Need  to  act  fast   –  No  need  to  store  them   •  Useful  data  are  far  in  between     –  Filter  it  out   •  Apply  Unix  Pipe  concept  distributed  style   –  Message  oriented  programming   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   96  
  • 97. What is Message Oriented Programming? 2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 97  
  • 98. TradiJonally •  Tightly  Coupled   –  Fixed  service  locaJon   –  Protocol  specific   –  Difficult  to  change/adapt  to  new  business  requirement     •  Lack  of  separaJon  between     –  Network  handling   –  ApplicaJon  logic     2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 98  
  • 99. Mixing  network  and  applicaJon  logic #include  <sys/types.h>   #include  <sys/socket.h>   •  Complex #include  <neJnet/in.h>   •  Time wasted #include  <arpa/inet.h>   #include  <stdio.h>   #include  <stdlib.h>   #include  <string.h>   #include  <unistd.h>     int  main(void){          struct  sockaddr_in  stSockAddr;          int  SocketFD  =  socket(PF_INET,  SOCK_STREAM,  IPPROTO_TCP);          memset(&stSockAddr,  0,  sizeof(stSockAddr));          stSockAddr.sin_family  =  AF_INET;          stSockAddr.sin_port  =  htons(5566);          stSockAddr.sin_addr.s_addr  =  INADDR_ANY;          bind(SocketFD,(const  void  *)&stSockAddr,  sizeof(stSockAddr))          listen(SocketFD,  10)          int  ConnectFD  =  accept(SocketFD,  NULL,  NULL);          //do  something   } 2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 99  
  • 100. 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 0  
  • 101. 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 1  
  • 102. Is  that  enough? •  Protocol  independence   •  LocaJon  independence   –  URL  vs.  Channel  ID.   •  Direct  vs.  Indirect  ConnecJon   –  Replacing  connecJon  to  server  with  connecJon  to  message  bus   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 2  
  • 103. Further  encapsulaJon •  To  aCach  to  the  message  bus:   Ø message-­‐source  |  your-­‐app-­‐here  |  message-­‐sink   Ø Message-­‐source  |  app-­‐1-­‐here  |  app-­‐2-­‐here  |  message-­‐sink   •  Just  like  Unix  pipe  concept   –  cat  log.txt  |  gawk  ‘{print  $1}’  |  sort  -­‐u   2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 10   3  
  • 104. Messaging  code  is  as  simple  as #include  <iostream>   #include  <string>     int  main()  {          std::string  name;          std::cin  >>  name;          std::cout  <<  "Hello!  "  <<  name;   }   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 4  
  • 105. Conceptually  it  is  sJll  data  flow •  Each  blue  arrow  is  now  a   message  channel  /  queue.   •  Each  component  can    be  in   different  locaJon,  and   dynamically  rearranged   with  minimum  effort   2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 10 5  
  • 106. Intra-­‐PC  vs.  Extra-­‐PC  messaging 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 6  
  • 107. CoordinaJng  tools  (1/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 7  
  • 108. Coordinator  (2/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 8  
  • 109. It  is  not  a  pipe  dream 2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 10 9  
  • 110. Scalability •  Wait  we  are  dealing  with  Big  Data  here!   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 0  
  • 111. Scalability •  Message  bus  becomes  the  boCleneck   –  Each  blue  arrow  represents  input/output  to  the  message  bus   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 1  
  • 112. ParJJoning  Message  Bus  (1/2) •  ParJJon   –  Spread  out  channels  across  different  message  servers   –  Load  balance   –  Avoid  network  boCleneck     –  Increase  number  of  channels  system  can  handle   •  Because  messaging  encapsulaJon   –  Server  selecJon  and  load  balance  are  automaJc.     2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 11 2  
  • 113. ParJJoning  Message  Bus  (2/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 3  
  • 114. Message Oriented Programming Tips 2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 11 4  
  • 115. Parallel  Upgrade  (1/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 5  
  • 116. Parallel  Upgrade  (2/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 6  
  • 117. Sharing  Context 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 7  
  • 118. How  WRS  does  it? 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 8  
  • 119. Big  Data  Tools •  In  House  SoluJon   –  Trend  Messaging  Exchange     •  Coordinate  and  distribute  works  among  large  set  of  machines   •  Enhanced  scalability  &  reliability     •  Open  Sourced:  hCps://github.com/trendmicro/tme/wiki   –  Lumber  Jack–  Ultra  High  Efficiency  indexing  system   •  Structures  log  allowing  for  <  10  seconds  retrieval  of  vital  staJsJcs  and   informaJon   –  TradiJonal  scanning  method  requires  >  10  minutes  to  days   –  60  Jmes  savings  in  Jme   •  Highly  specialized  for  Trend’s  tasks   •  Community  Supported  Projects   –  Trend  Customized  Hadoop/Hbase  data  storage   •  Involves  with  Hbase  steering  commiCee     –  Contribute  to  the  open  sourced  community   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 9  
  • 120. Big  Data  Begets  Big  data—        aka  Business  Intelligence •  We  have  built  a  large  infrastructure  processing  big  data.   –  Big  data  generates  big  data  generates  business  intelligence     –  For  example:  8  billion  URLs  flowing  through  the  system   •  8  billion  flowing  through  100  nodes  will  generate  800  billion  entries  in  log   (conservaJvely  esJmaJng)   •  Business  intelligence  extracJon   2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   12 0  
  • 122. Scale-­‐up  vs.  Scale-­‐out   http://natishalom.typepad.com/.a/6a00d835457b7453ef01348697aa8a970c-pi
  • 123. SQL  vs.  noSQL   http://community.sageaccpac.com/blogs/r_and_d/archive/2012/01/28/nosql-for-erp.aspx
  • 124. Public  Cloud  vs.  Private  Cloud  
  • 125. 用五個刪去法重新認識雲端運算   •  雲端不是一個地方   •  雲端不等於伺服器虛擬化   •  雲端不是孤島運作   •  雲端不是由上而下的發展   •  雲端不是說說而已   http://www.bnext.com.tw/focus/view/cid/103/id/23682