Who am I ?
• Mammi Chang 張雅芳
• Engineer, SPN, Trend Micro
• SPN Hadoop Cluster Administrator for 2 years
• Developer of operation tool
• Expertise : HDFS/Hbase/Pig
• Experience on Mahout Recommendation System
Web Reputation 8+ billions URL process daily
Technology Process Operation
User Traffic / Sourcing
CDN vender
Rating Server for Known Threats
Unknown & Prefilter
Page Download
Threat
Analysis
8 billions/day
4.8 billions/day
40% filtered
82% filtered
860 millions/day
99.98% filtered
25,000 malicious URL /day
Trend Micro
Products / Technology
CDN Cache
High Throughput Web Service
Hadoop Cluster
Web Crawling
Machine Learning
Data Mining
Block malicious URL within 15 minutes once it goes online!
SPN Solution Architecture
File
Web / URL
Email
Domain
IP
File Reputation Service
Email Reputation Service
Customer
Smart Protection
Community Intelligence
(Feedback loop)
Web Reputation Service
Sourcing
Processing
& Analysis
Validate &
Create Solution
Quality Assurance
Solution
Distribution
Solution
Adoption
SPN Correlation
SPN Hadoop Use Cases
Marketing
Report Near real time
6
Service Researcher
Data Scientist
Hadoop Platform
query
Service
Batch
processing
data
business
value
information
HBase HDFS
Yesterday
~40 Hadoop nodes
~15 Service/user accounts
7
3 Teams
<50 TB storage
<100 Jobs per day
Today
hundreds Hadoop nodes
>170 Service/user accounts
>13 Teams
~1.5 PB storage
>16000 Jobs per day
8
Real World Difficulties on Deployment
• Hundreds of servers
• Complicated Hadoop ecosystem deployment
• Necessary of configuration management
• Limited maintenance time
11
Hadoop
Ecosystem
Puppet
Hadooppet
A project for deploy
Trend Micro Hadoop
distribution on a
large cluster
12
IT automation
software
Hadooppet Workflow – Cluster Deployment
13
……….
/etc/puppet
|-- auth.conf
|-- fileserver.conf
|-- puppet.conf
`-- ssl
/etc/puppet
|-- auth.conf
|-- autosign.conf
|-- files
|-- fileserver.conf
|-- manifests
|-- modules
|-- puppet.conf
`-- ssl
Puppet
server
Yum
Server
Pull packages from Yum Server
Auto-deploy Hadoop
by role
Puppet
Client
Auto-deploy Hadoop
by role
Puppet
Client
Auto-deploy Hadoop
by role
Puppet
Client
1. certificate
request
2. Sign certificate
3. Retrive catalog for
nodes
Hadoop Node Hadoop Node Hadoop Node
Real World Difficulties on Hadoop Distribution
• Too many running services to do big change
• No suitable Hadoop version for Trend Micro
• Always need to patch for our need
18
Trend Micro Hadoop (TMH)
• Be flexible. Pick up
Business needed
features
• Fetch official patches
in to current adopted
version
• Add your own patch at
any time
ISSUE
TRACKING
• Jira
DEVELOPMEN
T
• Gitlab
• Hudson
TESTING
• Dumbo Cluster
• POC / Staging
DEPLOYMENT
• Hadooppet
PROFILING
• Nagios , Ganglia
• Splunk
MANAGEMEN
T
• Hadooppet
TMH Development Process
Jira
• Tracking
Issues
Gitlab
• Version
control of
source code
Unit Test
• Developer run
unit test at
development
local machine
Hudson
• Build / test
software
projects
Yum
Server
• Automatic
updates,
package and
dependency
management
20
21
POC Hadoop Cluster
Staging Hadoop Cluster
Production Hadoop
Cluster
Yum
Server
Developer
22
POC Hadoop Cluster
Staging Hadoop Cluster
Production Hadoop
Cluster
Yum
Server
Developer