Scaling big-data-mining-infra2
Upcoming SlideShare
Loading in...5
×
 

Scaling big-data-mining-infra2

on

  • 493 views

 

Statistics

Views

Total Views
493
Views on SlideShare
491
Embed Views
2

Actions

Likes
0
Downloads
12
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Scaling big-data-mining-infra2 Scaling big-data-mining-infra2 Presentation Transcript

  • Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience 黃振修 (Chris Huang) SPN 主動式雲端截毒技術架構師
  • About Me • SPN 主動式雲端截毒技術架構師 • SPN Hadoop 基礎運算架構師 • Hadoop in Taiwan 2013 講師 • Hadoop.TW 活躍成員 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 2
  • 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. The Journey to Big Data 3
  • 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 4 YesterdayYesterdayYesterdayYesterday ~40 Hadoop nodes ~15 Service/user accounts 3 Teams <50 TB storage <100 Jobs per day
  • 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 5 TodayTodayTodayToday ~200 Hadoop nodes ~130 Service/user accounts 11 Teams ~500 TB storage >16000 Jobs per day
  • 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 6 1 MapReduce Job Submitted Each 5.4 Seconds
  • 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 7 Why?Why?Why?Why? Raw Data Actionable Intelligence
  • Collaboration in the underground
  • 網路威脅呈現爆炸性的成長 New Unique Malware Discovered 各式各樣的變種病毒、垃圾郵件、不明的下載來源等等,這些來自網路上 的威脅,躲過傳統安全防護系統的偵測,一直持續呈現爆炸性的成長,形 成嚴重的資安威脅 1M unique Malwares every month 1M unique Malwares every month
  • Reality Check 2011 New Unique Threats per Hour (worldwide estimate*) Network Worms Threats Found in Enterprises (Real-world data from 150+ assessments*) Data-Stealing Malware IRC Bots Targeting Malware COMPLEXITY DANGER Dangerous RisksSkyrocketing Volume Avoiding Detection 42% 56% 77% 100% 2010200920082007 12600 NEW Threat Every 0.28 Seconds 2400 • 52% of companies failed to report or remediate a cyber breach in 2011. --- SAIC, 2011 • Two new pieces of malwares are created every second. --- Trend Micro, 2012 • A cyber intrusion occurs every 5 minutes. --- US CERT 2012
  • Traditional approach is no more sufficient!
  • 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. Big Data Exploration 17
  • New approach for cyber threat solution Web CrawlerWeb Crawler Trend Micro Endpoint Protection Trend Micro Endpoint Protection Trend Micro Mail Protection Trend Micro Mail Protection Trend Micro Web Protection Trend Micro Web Protection HoneypotHoneypot CDN / xSPCDN / xSP Researcher Intelligence Researcher Intelligence 3+ Billion Worldwide Sensors
  • SPN: Smart Protection Network 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 19 Collects Protects Identifies BIG DATA ANALYTICS (Data Mining, Machine Learning, Modeling, Correlation) DAILY STATS: • 7.2 TB data correlated • 1B IP addresses • 90K malicious threats identified • 100+M good files
  • SPN High Level Architecture 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 20 Receiver Trend Message Exchange (Message Bus) Hadoop Distributed File System (HDFS) HBaseMapReduce Adhoc-Query (Pig) Oozie CDN/xSP Log Honey Pot SPN Feedback Data SourcingData Sourcing APP 1 MySPN Platform Solr Cloud API Server/Portal Service Platform APP 2 Service DeliveryService Delivery
  • MySPN Ecosystem Portal & API Single Entry-Point SPN Infrastructure APT KB Service TopCVE Service APT KB VE DB FB Logs Census MySPN Market Place Service Platform SSO New App OPS RD / Team Monitor SDK All My Guard Threat Connect Dashboard Service Catalog Census Profile Alert New App Dispatcher Access Login Trender Need Solution Customer Publish ImplementOperate Develop Solution backed-by Data Catalogue
  • SPN Solution Architecture File URL Web / URL Email Domain IP File Reputation ServiceFile Reputation Service Email Reputation ServiceEmail Reputation Service Customer SmartProtection Community Intelligence (Feedback loop) Web Reputation ServiceWeb Reputation Service Sourcing Processing & Analysis Validate & Create Solution Quality Assurance Solution Distribution Solution Adoption SPN Correlation
  • 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. Big Data Case Study 23
  • Internet Web Server 4. Access page 1. Intercept URL SPN Cloud 9/10/2013 24 200K+ new URL created every day Case Study: Web Reputation Services
  • 8+ billions URL process daily User Traffic / Sourcing CDN vender Rating Server for Known Threats Unknown & Prefilter Page Download Threat Analysis 8 billions/day 4.8 billions/day 860 millions/day 40% filtered 82% filtered 25,000 malicious URL /day 99.98% filtered Trend Micro Products / Technology CDN Cache High Throughput Web Service Hadoop Cluster Web Crawling Machine Learning Data Mining Technology Process Operation Block malicious URL within 15 minutes once it goes online!
  • WRS Architecture Overview
  • 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. Big Data Lesson Learned 27
  • How to Scale? • Un-structure data first • If you really need structure data – Use Google Protocol Buffers or – JSON string • Purify your data before processing • Leverage HBase more – Well design row key to prevent hot-spot • Use MapReduce to create Lucene index • Leverage SolrCloud for complex real-time use cases 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 28
  • Our Learning • Has clear strategy first • Start small, scale quickly • Chose right solution for right problem
  • Q&A 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 30
  • 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 31 Big Challenge Big Opportunity Thank You