Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2012 11-16 cloud practices-in_trend_micro_2012 - chung-tsai su

663 views

Published on

  • Be the first to comment

  • Be the first to like this

2012 11-16 cloud practices-in_trend_micro_2012 - chung-tsai su

  1. 1. Cloud Practices in Trend MicroChung-Tsai Su, Ray LiaoCore TechTrend Micro Inc.2012/11/06
  2. 2. Agenda •  What  is  happening  in  Real  World  •  Big  Data  in  Trend  Micro  •  Experience  Sharing   –  Galileo15   –  WRS  •  Discussions  •  Q&A  
  3. 3. What’s happening in the real Real World
  4. 4. Real  World  
  5. 5. http://www.taiwannews.com.tw/etn/news_content.php?id=2017595
  6. 6. http://www.17inda.com/html/3/article-1937.html
  7. 7. SPAM  aCacked  to  our  CEO   Eva Chen CEO & Co-Founder Trend Micro
  8. 8. SPAM  aCacked  to  me   http://rebelrowsers.com/AV8z8s/index.html http://videospornogratis.com.es/DGx9Zv/index.html http://newarkpartytents.com/BvFNK66F/index.html
  9. 9. YAHOO  攝影聯合會  
  10. 10. http://www.bnext.com.tw/article/view/cid/103/id/24959
  11. 11. Big Data in Trend Micro
  12. 12. Smart  ProtecJon  Network  (SPN)   Date: 2012/09/25
  13. 13. New  Approach  for  Cyber  Threat  SoluJon   CDN  /  xSP   Researcher   Intelligence   Honeypot   Web  Crawler   Trend  Micro   Mail  ProtecJon   Trend  Micro   Trend  Micro     Endpoint  ProtecJon   Web  ProtecJon   300+  Million  Worldwide  Sensors  
  14. 14. SPN Solution Architecture Processing   Validate  &   Quality   Solu<on     Solu<on     Sourcing &  Analysis Create  Solu<on Assurance Distribu<on Adop<on File File Reputation ServiceWeb /URL Smart Protection CustomerEmail Web Reputation ServiceDomain Email Reputation Service IP SPN Correlation Community Intelligence (Feedback loop)
  15. 15. Challenges  We  Are  Faced   6TB  of  data  and  15B  lines  of  logs  received  daily  by     It  becomes  the  Big  Data  Challenge!  
  16. 16. Overview  –  Smart  Feedback  Data  Source Akamai (*): URL users accessed Access NSC_TmProxy_URLF_002: APP accessed malicious URL Exposure   NSC_TmProxy_HFS_001: URL hosted suspicious/malicious file Layer Content AMSP_TMBP_NSC_001: URL hosted shellcode BES_001: URL hosted suspicious/malicious content SAL_001: URL hosted suspicious/malicious content TMASE_001: Email contains suspicious/malicious content Infec<on  Layer VSAPI_001: File detected as suspicious/malicious CENSUS_001: File executed on endpoint AEGIS_001: APP with suspicious/malicious behavior RCA_001: Endpoint infection chain Dynamic  Layer CONAN_001: File detected by heuristic rules CONAN_002: Heuristic rule detection result of a file DCE_001: Clean result DRE_001, PEDif_001, LCE_001
  17. 17. Feedback  Source  in  Terms  of  Products Gateway Consumer SMB Enterprise Products Schema ID Endpoint Endpoint Endpoint (IMSS/ (Titanium/TIS) (WFBS) (OSCE) IWSS)Akamai V V V VNSC_TmProxy_URLF_002 V V VNSC_TmProxy_HFS_001 V (*) V VAMSP_TMBP_NSC_001 V (*)BES_001 VSAL_001 VTMASE_001 VVSAPI_001 V V VCENSUS_001 VAEGIS_001 V V VRCA_001 V V (*)CONAN_001 VCONAN_002 VDCE_001 V V VDRE_001 VPEDif_001 VLCE_001 V
  18. 18. Feedback  VolumesNSC_TmProxy_URLF_002 AEGIS_001 VSAPI_001 CONAN_002 TMASE_001 DCE_001 CENSUS_001 CONAN_001 RCA_001 LCE_001 SAL_001 NSC_TmProxy_HFS_001 DRE_001 BES_001 AMSP_TMBP_NSC_001 PEDif_001 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000
  19. 19. Feedback  StaJsJcs Unique GUID  ﹪ Housecall 275,934 1.64% Consumer 12,801,624 76.09% Enterprise 3,747,248 22.27%  16,824,806 
  20. 20. Unique  Endpoints  Count  by  Products
  21. 21. Unique  Endpoint  Counts  by  Industry    (industry  category  feedback  only  from  Enterprise  products) Notspecified 1,564,471 47.43% Specified 1,734,141 52.57%
  22. 22. SPN  High  Level  Architecture API  Server/Portal  (SSO)   SPN   Honey   CDN/xSP   Feedback   Pot   Log  Data  Sourcing   Service  Pla]orm   MySPN  PlaSorm   Log  Receiver   Solr  Cloud   Log  Post-­‐processing   Web   Pages Hadoop  Distributed  File  System    (HDFS)   CorrelaJon   Threat   Census   DRR   Pla]orm   Connect   Tracking   Global   Akame   Logging   Object   System   Cache   Adhoc-­‐Query  (Pig)   MapReduce   Oozie   HBase   Trend  Message  Exchange  (Message  Bus)   Email  ReputaJon   3rd-­‐Party     File  ReputaJon     Web  ReputaJon     Service   Data  Feed   Service   Service  
  23. 23. Service  Stack  of  SPN   SAL/MKT TS RD Consumer Enterprise Internal Customer External Customer Threat Landscape Risk Management User Experience Service Catalogue MagicQ ZDASE Census APT Report Widget Global Intelligence Network Entity Web Mobile Correlation Cloud Infra Infrastructure Data Akamai Zone Files FRS WRS Catalogue Census Feedback ERS SPN Cooked Data Raw Data Feeds Feeds
  24. 24. SPN Ecosystem API OLAP System MySPN Framework Web Frontend Data Mining SolrSourcing RDB Adhoc-Query TME Oozie Scribe Pig Hive Arvo Protobuf Flume Streaming MapReduce Engine Hadoop HCatalogData Inputs Data Outputs HDFS HBase OLTP System Middleware / DB / K-V Stores Web Frontend
  25. 25. Experience SharingGalileo15
  26. 26. Chung-­‐Tsai  Su,  Spark  Tsao,    Wynne  Chu,  and  Ray  Liao  
  27. 27. 20 years ago, a young man said: t L et’s figh ad guys b
  28. 28. At that time, bad guys appear 1 by 11 on 1 34
  29. 29. In recent years, bad guys mutate themselves … 1 on 10 35
  30. 30. Nowadays, bad guys adopt community attack … SuPaWind We need a community-based solution 36
  31. 31. ChallengesSolutionApplications 37
  32. 32. Challenges Solution Applications 38
  33. 33. Huge amount of mass data Number Size Source Category (thousands/day) (MB/day) SPN Feedback log 17,000 11,000 WRS Web log Web  log 4,500,000 4,500,000 1,500,000 1,500,000 GPServer Crawl 300,000 N/A FRS File sourcing 30 14,000 Honeypot 78,000 200,000 ERS IP level Queries 1,200,000 46,000 39 39
  34. 34. What is the best data structureto describe “Community”? Hash Clique Tree Sequence Clique 40
  35. 35. Clique Botnet facevook.com facebouk.com Fast-flux Phishing facenook.com 41
  36. 36. NP Hard42 42
  37. 37. Galileo15 Makes it Possible!!!• 2  observa<ons  from  the  data   –  Sparse  connec<on  with  low  diameter  preference   –  Incomplete  connec<on     Domain IP 66.135.202.89 fahrzeugteile.shop.ebay.de 66.135.205.141 shop.ebay.ca 66.135.213.211 66.135.213.215 videogames.shop.ebay.com.au 66.211.160.11 66.211.180.27 Missing edges 43 43
  38. 38. Galileo15   Transform  mass  raw  data  into  community  structures Host Host IP Domain 66.135.202.89 fahrzeugteile.shop.ebay.de 66.135.205.141 shop.ebay.ca 66.135.213.211 66.135.213.215videogames.shop.ebay.com.au 66.211.160.11 66.211.180.27 44
  39. 39. ChallengesSolution Applications 45
  40. 40. Architecture of Galileo15 Clique Clique Clique Enumeration Matching Ranking 46
  41. 41. Architecture of Galileo15 Clique   Clique   Clique Matching    Enumeration   Ranking 47
  42. 42. Architecture of Galileo15 Clique Clique   Clique Matching   Ranking Enumeration 48
  43. 43. Architecture of Galileo15 Clique Clique Clique Enumeration Ranking Matching Time0 Time1 49
  44. 44. Architecture of Galileo15 Clique Clique Clique Enumeration Ranking Matching Time0 Time1 50
  45. 45. Architecture of Galileo15 Clique Clique Clique Enumeration Matching Ranking •  Static Rank Time0 Time1 •  Dynamic Rank 1 2 51
  46. 46. Architecture of Galileo15 Clique Clique Clique Enumeration Matching Ranking •  Static Rank Time0 Time1 •  Dynamic Rank 1 2 52
  47. 47. Clique Enumeration Reduces workload 1 hour Hadoop CliquesWeb browsing log 180 million logs 700,000 cliques 7 reducers < 5 minutesRuns at WRS ALPS Env. 40.3%40 Machines 53
  48. 48. Clique Matching Saves computational consumption Run time > 1 day 90.8% 20 mins15 mins < 2 mins Brute force Hash-based Multi-layer Hash-based 54
  49. 49. Challenges SolutionApplications 55
  50. 50. Why  “Community    that  Fits”?     Domain IP Server420.at.youporn.com 87.248.207.141 56
  51. 51. Why  “Community    that  Fits”?     Domain IP 203.77.186.249 69.164.22.140 Server114.at.youporn.com 69.164.22.153 69.164.22.154 Server346.at.youporn.com 87.248.203.50 Server420.at.youporn.com 87.248.207.141 87.248.210.147 Server730.at.youporn.com 87.248.211.194 Server923.at.youporn.com 87.248.211.223 87.248.212.55 87.248.218.132 57
  52. 52. Why  “Community    that  Fits”?     Domain IP 203.77.186.249 69.164.22.140 Server114.at.youporn.com 69.164.22.153 WTP 69.164.22.154 Server346.at.youporn.com 87.248.203.50 DUL from ERS Server420.at.youporn.com 87.248.207.141 87.248.210.147 Server730.at.youporn.com Malicious 87.248.211.194 Server923.at.youporn.com 87.248.211.223 Phishing 87.248.212.55 87.248.218.132 58
  53. 53. Why  “Community    that  Fits”?     Domain IP 203.77.186.249 69.164.22.140 Server114.at.youporn.com 69.164.22.153 WTP 69.164.22.154 Server346.at.youporn.com 87.248.203.50 DUL from ERS Server420.at.youporn.com 87.248.207.141 87.248.210.147 Server730.at.youporn.com Malicious 87.248.211.194 Server923.at.youporn.com 87.248.211.223 Phishing 87.248.212.55 87.248.218.132 59
  54. 54. Some porn websites are not blockedbut caught by Galileo15 amateurmaturevoyeur.pornblink.com bareasswhipping.pornblink.com WTP desihotpoint.com freexxxamaturefucking.pornblink.com Phishing fxxkinsilly.com goldengatebridgebuilt.pornblink.com 203.77.186.249 hotolderwomenshowingpants.pornblink.com matureamateurgallerysoftcore.pornblink.com Malicious skinnyteenanallesbian.pornblink.com spermster.com Pornography Pornography 60
  55. 55. Applications   Clique Enumeration Clique Matching Clique Ranking Domain IP Domain IP Domain IP Domain IP Domain IP#Cliques Time T0 T0+15 T0+30 T0+45 T0+60 61
  56. 56. Applications   Clique Enumeration Clique Matching Clique Ranking Domain IP Domain IP Domain IP Domain IP Domain IP#Cliques WhiteListing Anomaly detection Web Hosting FastFlux Time T0 T0+15 T0+30 T0+45 T0+60 62
  57. 57. Applications   Domain IP Domain IP Domain IP Domain IP Domain IP#Cliques 1 Whitelisting 2 Anomaly detection 3 Web Hosting 4 Fast Flux Time T0 T0+15 T0+30 T0+45 T0+60 63
  58. 58. More? 64
  59. 59. History Evolution Clique Sequence Clustering Classification1980 1990 2000 2010 65
  60. 60. Summary•  Propose a brand-new community representation•  Provide a powerful graph-based correlation engine•  Reduce 40.3% workload•  Bring huge business value 66
  61. 61. Q&A 67
  62. 62. ALGORITHM 68
  63. 63. Clique Enumeration algorithm (1/2) input map reduce output Domain1, IP1 Domain2, IP2 map Shuffling by key Domain3, IP3 Domain1 ; IP1,1 , IP1,2, … reduc Domain2 ; IP2,1 , IP2,2, … e Domain3 ; IP3,1 , IP3,2, … map Domaini ; IPi,1 , IPi,2 , … reduc Domaini+1; IPi+1,1, IPi+1,2, … e Domaini+2; IPi+2,1, IPi+2,2, … map Domainj ; IPj,1 , IPj,2 , … reduc Domainj+1; IPj+1,1, IPj+1,2, … e Domainj+2; IPj+2,1, IPj+2,2, … Sorting by key map Domainn, IPn 69
  64. 64. Clique Enumeration algorithm (2/2) input map reduce output Domain1 ; IP1,1 , IP1,2, … Domain2 ; IP2,1 , IP2,2, … Shuffling by key map Domain3 ; IP3,1 , IP3,2, … reduce Domaini ; IPi,1 , IPi,2 , … Domaini+1; IPi+1,1, IPi+1,2, … map Domaini+2; IPi+2,1, IPi+2,2, … reduce Domainj ; IPj,1 , IPj,2 , … Domainj+1; IPj+1,1, IPj+1,2, … Sorting by key map Domainj+2; IPj+2,1, IPj+2,2, … 70
  65. 65. Parameters of Clique Enumeration algorithm L R §  γ : density of edges of Quasi-Clique l1 ú  |E| ≥ γ |L| |R| l2 r1 r2 §  MinE: Minimum support of each edge l3 r3 ú  #E(li,rj) ≥ MinE l4 §  MinL, MaxL : Minimal and maximal number of G(L,R,E) objects on the left site of a clique L = { l 1, l 2, l 3, l 4} ú  MinL ≥ |L| ≥ MaxL R = {r1, r2, r3} §  MinR, MaxR : Minimum and maximum number of E = {(li, rj)| objects on the right site of a clique 1≦i≦4,1≦j≦3} |L| = 4, |R| = 3    MinR ≥ |R| ≥ MaxR Deg(l1) = 2, Deg(l2) = 3 §  Min_DegL, Min_DegR: Minimum degrees of objects on the left and right site of a clique, respectively ú  Deg(li) ≥ Min_DegL ∀li ∈ L; Deg(rj) ≥ Min_DegR ∀rj ∈ R 71
  66. 66. SpecificaJon  for  Hadoop  Environment   Number of Machines 40 Machine Type Dell PE2950 CPU QuadCore Xeon 5410 x 2 RAM 4GB (667MHz) x 2 Disk 300 GB SATA 7.2K x 6 OS RHEL AS4, 32 bits72  
  67. 67. Environment for POC 73  
  68. 68. Time consumption on Clique Enumeration algorithm (1/4) Time(Sec.) #Reducers 74  
  69. 69. Time  consumpJon  on     Clique  EnumeraJon  algorithm  (2/4) Number of Reducers 1st mapper 1st reducer 2nd mapper 2nd reducer Total time 1 27 1201 52 97 1377 2 27 556 27 54 664 3 27 357 18 39 441 4 27 306 15 33 381 5 27 249 12 30 318 6 27 225 12 27 291 7 27 195 9 24 255 8 27 193 9 23 252 9 27 178 9 22 236 10 27 165 9 21 22275  
  70. 70. Time consumption on Clique Enumeration algorithm (3/4) 4,000,000,000 6000 #l ogs 3,500,000,000 T i me 5000 3,000,000,000 Number of logs Time (sec.) 4000 2,500,000,000 2,000,000,000 3000 1,500,000,000 2000 1,000,000,000 1000 500,000,000 0 0 1 2 4 8 1676 Hours 76  
  71. 71. Time consumption on Clique Enumeration algorithm (4/4) 1st 1st 2nd 2nd TotalHours #records #cliques Mappers map reduce map reduce time 1 182,642,849 730,651 416 27 195 9 24 293 2 375,836,783 1,008,351 849 27 300 15 33 505 4 763,789,635 1,323,948 1717 27 739 24 57 990 8 1,556,210,147 1,834,466 3466 27 1810 36 84 2270 16 3,773,804,326 2,518,523 8280 27 4203 69 188 5304 77  
  72. 72. Multi-Layer Hash-based Matching Algorithm T0 T1 Hash table 1 l1 r1 2 Size=1 l1 r1 l2 3 r2 4 l3 r3 l4 5 Size=2~5 l1 6 l2 r1 r2 Size>6 7 l3 r3 l4 r4 l5 l6 r5 78
  73. 73. Community Matching Algorithm (1/2) Algorithm Time  (sec.)   #Clique-­‐pairs   Brute  force   129,630   483,919   Hash-­‐based   1,194   483,919   Multi-­‐layer  hash-­‐based   110   424,213   Time0 Time1 90.8% 79
  74. 74. Community  Matching  Algorithm  (2/2) 1000000 Hash-based Multi-layer hash-basedNumber of Clique-pairs 100000 10000 1000 100 10 1 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%80 Similarity
  75. 75. Experience SharingWeb Reputation Service (WRS)
  76. 76. Web  ReputaJon  System 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   82  
  77. 77. Big  World,  Big  Data   •  Important  numbers  for  WRS   –  8  billions  queries  daily   –  9  hundred  millions  URLs  analyzed  daily   –  <  0.01%  daily  URLs  idenJfied  as  malicious   •  Finding  needle  in  the  haystack  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   83  
  78. 78. Processing  Big  Data 0.1 ms per URL •  Content  analysis:  900  million  unique  URLs  /  24  hr  =  10K  URLs  per  second     –  Challenge:  How  to  coordinate,  maintain  and  distribute  work  among  large  set  of  machines   (>  500  machines)  ?   •  Raw  log  analysis:    3  Terabytes  of  data  each  day   –  Challenge:  How  to  store  them  in  a  way  that  is  reliable  +  fast  to  retrieve  relevant  data?   –  How  to  process  log  (present  +  historical  ~  500TB)  to  provide      vital  staJsJcs  and  trend?   Historical Trend Vital Present 3 Terabytes Statistics View per day Raw Log Anomaly 8 billions URLs per day Detection 19K URLs per day User Queries Malicious 900 million URLs per day URLs Unique Content URLs Analysis 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   84  
  79. 79. Today’s  Agenda •  Discussion  of  the  real-­‐world  design   –  Constraints     –  Requirements   •  Sample  of  tools  available   –  When  to  use  them?   –  How?    2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   85  
  80. 80. What  are  we  trying  to  do  with  Big  Data? 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   86  
  81. 81. Usage  Triangle •  Historical domain – IP relations •  Historical access pattern •  Known malicious actors …•  Detect s abnormal behavior•  Groups malicious domains•  Potential malicious URLs… •  Malicious Activities?2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   87  
  82. 82. Constraints  Triangle •  What data to store? •  How much data to store? •  For How long? •  Readily accessible •  $$$ •  Threat coverage •  How fast discovery can be?2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   88  
  83. 83. CLS  ObservaJon •  Like  CAP  theorem  where  one  can  only  saJsfy  2  out  of  3   constraints,  one  can  only  saJsfy  2  out  of  3  constraints  when   working  on  threat  discovery.     –  (Coverage+,  Latency+):    It  is  impossible  to  achieve  fast  discovery  &   large  coverage  without  an  enormous  data  store  to  provide  the   necessary  informaJon  for  decision  making.     –  (Latency+,  Storage+):  By  focusing  on  a  smaller  set  of  URLs,  we  can   provide  fast  discovery  without  need  for  huge  data  store.     –  (Coverage+,  Storage+):  By  allowing  longer  discovery  Jme,  we  can   enhance  the  coverage  without  using  a  large  data  store.  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   89  
  84. 84. It  is  all  about  the  trade-­‐off 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   90  
  85. 85. Two  schools  of  thoughts  (1/2) •  (Storage+,  Latency+)   –  ACacks  are   •  Wave  in  nature   –  Sudden  appearance   –  Short  lifespan     •  Disposable   –  Use  once  and  throw  away   •  Regionalized   –  Global  epidemic  are  less  common   •  Few   –  <  0.01%  of  the  daily  unique  URLs  are  malicious  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   91  
  86. 86. Streaming  Example 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   92  
  87. 87. Two  schools  of  thoughts  (2/2) •  (Coverage+,  Latency+)   –  History  repeats  itself   •  So  does  hacker’s  infrastructure  (not  so  throwaway)   –  ProtecJng  coverage  is  essenJal   •  Detectable  by  more  thorough  invesJgaJon  with  larger  context   –  Future-­‐Proof   •  Our  soluJon  reflects  past  knowledge   •  If  we  don’t  accumulate/adapt  /evolve  our  knowledge,  our  soluJon  will  be   obsolete  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   93  
  88. 88. Batch  Example 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   94  
  89. 89. It  Boils  Down  to  Streaming  vs.  Batch  Processing       •  Streaming  looks  at  queries  in  real-­‐Jme   –  Filters  out  unneeded  URLs   –  Processes  suspicious  URLs  only   –  Kava,  S4,  Trend  Messaging  Exchange   •  Batch  processing   –  Not  real-­‐Jme     –  Broader  scope   –  Hadoop  Map-­‐Reduce  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   95  
  90. 90. Streaming  Big  Data •  URL  and  its  value  are  ephemeral   –  Need  to  act  fast   –  No  need  to  store  them   •  Useful  data  are  far  in  between     –  Filter  it  out   •  Apply  Unix  Pipe  concept  distributed  style   –  Message  oriented  programming  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   96  
  91. 91. What is Message Oriented Programming? 2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 97  
  92. 92. TradiJonally •  Tightly  Coupled   –  Fixed  service  locaJon   –  Protocol  specific   –  Difficult  to  change/adapt  to  new  business  requirement     •  Lack  of  separaJon  between     –  Network  handling   –  ApplicaJon  logic    2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 98  
  93. 93. Mixing  network  and  applicaJon  logic #include  <sys/types.h>   #include  <sys/socket.h>   •  Complex #include  <neJnet/in.h>   •  Time wasted #include  <arpa/inet.h>   #include  <stdio.h>   #include  <stdlib.h>   #include  <string.h>   #include  <unistd.h>     int  main(void){          struct  sockaddr_in  stSockAddr;          int  SocketFD  =  socket(PF_INET,  SOCK_STREAM,  IPPROTO_TCP);          memset(&stSockAddr,  0,  sizeof(stSockAddr));          stSockAddr.sin_family  =  AF_INET;          stSockAddr.sin_port  =  htons(5566);          stSockAddr.sin_addr.s_addr  =  INADDR_ANY;          bind(SocketFD,(const  void  *)&stSockAddr,  sizeof(stSockAddr))          listen(SocketFD,  10)          int  ConnectFD  =  accept(SocketFD,  NULL,  NULL);          //do  something   } 2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 99  
  94. 94. 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 0  
  95. 95. 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 1  
  96. 96. Is  that  enough? •  Protocol  independence   •  LocaJon  independence   –  URL  vs.  Channel  ID.   •  Direct  vs.  Indirect  ConnecJon   –  Replacing  connecJon  to  server  with  connecJon  to  message  bus  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 2  
  97. 97. Further  encapsulaJon •  To  aCach  to  the  message  bus:   Ø message-­‐source  |  your-­‐app-­‐here  |  message-­‐sink   Ø Message-­‐source  |  app-­‐1-­‐here  |  app-­‐2-­‐here  |  message-­‐sink   •  Just  like  Unix  pipe  concept   –  cat  log.txt  |  gawk  ‘{print  $1}’  |  sort  -­‐u  2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 10   3  
  98. 98. Messaging  code  is  as  simple  as #include  <iostream>   #include  <string>     int  main()  {          std::string  name;          std::cin  >>  name;          std::cout  <<  "Hello!  "  <<  name;   }  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 4  
  99. 99. Conceptually  it  is  sJll  data  flow •  Each  blue  arrow  is  now  a   message  channel  /  queue.   •  Each  component  can    be  in   different  locaJon,  and   dynamically  rearranged   with  minimum  effort  2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 10 5  
  100. 100. Intra-­‐PC  vs.  Extra-­‐PC  messaging 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 6  
  101. 101. CoordinaJng  tools  (1/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 7  
  102. 102. Coordinator  (2/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   10 8  
  103. 103. It  is  not  a  pipe  dream 2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 10 9  
  104. 104. Scalability •  Wait  we  are  dealing  with  Big  Data  here!  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 0  
  105. 105. Scalability •  Message  bus  becomes  the  boCleneck   –  Each  blue  arrow  represents  input/output  to  the  message  bus  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 1  
  106. 106. ParJJoning  Message  Bus  (1/2) •  ParJJon   –  Spread  out  channels  across  different  message  servers   –  Load  balance   –  Avoid  network  boCleneck     –  Increase  number  of  channels  system  can  handle   •  Because  messaging  encapsulaJon   –  Server  selecJon  and  load  balance  are  automaJc.    2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 11 2  
  107. 107. ParJJoning  Message  Bus  (2/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 3  
  108. 108. Message Oriented Programming Tips 2012/11/8   Confidential | Copyright 2012 Trend Micro Inc. 11 4  
  109. 109. Parallel  Upgrade  (1/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 5  
  110. 110. Parallel  Upgrade  (2/2) 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 6  
  111. 111. Sharing  Context 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 7  
  112. 112. How  WRS  does  it? 2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 8  
  113. 113. Big  Data  Tools •  In  House  SoluJon   –  Trend  Messaging  Exchange     •  Coordinate  and  distribute  works  among  large  set  of  machines   •  Enhanced  scalability  &  reliability     •  Open  Sourced:  hCps://github.com/trendmicro/tme/wiki   –  Lumber  Jack–  Ultra  High  Efficiency  indexing  system   •  Structures  log  allowing  for  <  10  seconds  retrieval  of  vital  staJsJcs  and   informaJon   –  TradiJonal  scanning  method  requires  >  10  minutes  to  days   –  60  Jmes  savings  in  Jme   •  Highly  specialized  for  Trend’s  tasks   •  Community  Supported  Projects   –  Trend  Customized  Hadoop/Hbase  data  storage   •  Involves  with  Hbase  steering  commiCee     –  Contribute  to  the  open  sourced  community  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   11 9  
  114. 114. Big  Data  Begets  Big  data—        aka  Business  Intelligence •  We  have  built  a  large  infrastructure  processing  big  data.   –  Big  data  generates  big  data  generates  business  intelligence     –  For  example:  8  billion  URLs  flowing  through  the  system   •  8  billion  flowing  through  100  nodes  will  generate  800  billion  entries  in  log   (conservaJvely  esJmaJng)   •  Business  intelligence  extracJon  2012/11/8   ConfidenJal  |  Copyright  2012  Trend  Micro  Inc.   12 0  
  115. 115. Discussion
  116. 116. Scale-­‐up  vs.  Scale-­‐out  http://natishalom.typepad.com/.a/6a00d835457b7453ef01348697aa8a970c-pi
  117. 117. SQL  vs.  noSQL  http://community.sageaccpac.com/blogs/r_and_d/archive/2012/01/28/nosql-for-erp.aspx
  118. 118. Public  Cloud  vs.  Private  Cloud  
  119. 119. 用五個刪去法重新認識雲端運算   •  雲端不是一個地方   •  雲端不等於伺服器虛擬化   •  雲端不是孤島運作   •  雲端不是由上而下的發展   •  雲端不是說說而已  http://www.bnext.com.tw/focus/view/cid/103/id/23682

×