SlideShare a Scribd company logo
Trend Micro SPN Hadoop 
Overview 
張雅芳Mammi Chang 
@ 2014 Taiwan HadoopCon
Who am I ? 
• Mammi Chang 張雅芳 
• Engineer, SPN, Trend Micro 
• SPN Hadoop Cluster Administrator for 2 years 
• Developer of operation tool 
• Expertise : HDFS/Hbase/Pig 
• Experience on Mahout Recommendation System
3 
Why Big Data 
in Trend Micro?
Web Reputation 8+ billions URL process daily 
Technology Process Operation 
User Traffic / Sourcing 
CDN vender 
Rating Server for Known Threats 
Unknown & Prefilter 
Page Download 
Threat 
Analysis 
8 billions/day 
4.8 billions/day 
40% filtered 
82% filtered 
860 millions/day 
99.98% filtered 
25,000 malicious URL /day 
Trend Micro 
Products / Technology 
CDN Cache 
High Throughput Web Service 
Hadoop Cluster 
Web Crawling 
Machine Learning 
Data Mining 
Block malicious URL within 15 minutes once it goes online!
SPN Solution Architecture 
File 
Web / URL 
Email 
Domain 
IP 
File Reputation Service 
Email Reputation Service 
Customer 
Smart Protection 
Community Intelligence 
(Feedback loop) 
Web Reputation Service 
Sourcing 
Processing 
& Analysis 
Validate & 
Create Solution 
Quality Assurance 
Solution 
Distribution 
Solution 
Adoption 
SPN Correlation
SPN Hadoop Use Cases 
Marketing 
Report Near real time 
6 
Service Researcher 
Data Scientist 
Hadoop Platform 
query 
Service 
Batch 
processing 
data 
business 
value 
information 
HBase HDFS
Yesterday 
~40 Hadoop nodes 
~15 Service/user accounts 
7 
3 Teams 
<50 TB storage 
<100 Jobs per day
Today 
hundreds Hadoop nodes 
>170 Service/user accounts 
>13 Teams 
~1.5 PB storage 
>16000 Jobs per day 
8
1 MapReduce Job 
9 
Submitted 
Each 5.4 Seconds
10 
Central 
Management 
Hadoop 
as a 
Service 
Automation 
Highly 
Availability 
Customizatio 
n
Real World Difficulties on Deployment 
• Hundreds of servers 
• Complicated Hadoop ecosystem deployment 
• Necessary of configuration management 
• Limited maintenance time 
11
Hadoop 
Ecosystem 
Puppet 
Hadooppet 
A project for deploy 
Trend Micro Hadoop 
distribution on a 
large cluster 
12 
IT automation 
software
Hadooppet Workflow – Cluster Deployment 
13 
………. 
/etc/puppet 
|-- auth.conf 
|-- fileserver.conf 
|-- puppet.conf 
`-- ssl 
/etc/puppet 
|-- auth.conf 
|-- autosign.conf 
|-- files 
|-- fileserver.conf 
|-- manifests 
|-- modules 
|-- puppet.conf 
`-- ssl 
Puppet 
server 
Yum 
Server 
Pull packages from Yum Server 
Auto-deploy Hadoop 
by role 
Puppet 
Client 
Auto-deploy Hadoop 
by role 
Puppet 
Client 
Auto-deploy Hadoop 
by role 
Puppet 
Client 
1. certificate 
request 
2. Sign certificate 
3. Retrive catalog for 
nodes 
Hadoop Node Hadoop Node Hadoop Node
Hadooppet Workfolw – Change Configuration 
15 
/etc/puppet 
|-- auth.conf 
|-- autosign.conf 
|-- files 
|-- fileserver.conf 
|-- manifests 
|-- modules 
|-- puppet.conf 
`-- ssl 
Hadoop Node Hadoop Node ………. Hadoop Node 
/etc/puppet 
|-- auth.conf 
|-- fileserver.conf 
|-- puppet.conf 
`-- ssl 
Puppet 
server 
Puppet 
Client 
Puppet 
Client 
Puppet 
Client 
conf 
2. Synchronize 
Configuration 
1. Modify 
configuration at 
server side
CLUSTER DEPLOYMENT BY 
DISTRIBUTION / ENVIRONMENT 
• POC, Staging, Production 
• All-in-one VM, AWS EC2 deployment 
CLUSTER DEPLOYMENT 
• Package installation 
• Configuration adjustment 
CLUSTER OPERATION 
• Add new Hadoop node/client 
• Account management 
• Process management 
Hadooppet 
SANITY CHECK 
• DFSIO, YCSB , etc 
• Sample Applications 
16
Anything 
more? 
17
Real World Difficulties on Hadoop Distribution 
• Too many running services to do big change 
• No suitable Hadoop version for Trend Micro 
• Always need to patch for our need 
18
Trend Micro Hadoop (TMH) 
• Be flexible. Pick up 
Business needed 
features 
• Fetch official patches 
in to current adopted 
version 
• Add your own patch at 
any time 
ISSUE 
TRACKING 
• Jira 
DEVELOPMEN 
T 
• Gitlab 
• Hudson 
TESTING 
• Dumbo Cluster 
• POC / Staging 
DEPLOYMENT 
• Hadooppet 
PROFILING 
• Nagios , Ganglia 
• Splunk 
MANAGEMEN 
T 
• Hadooppet
TMH Development Process 
Jira 
• Tracking 
Issues 
Gitlab 
• Version 
control of 
source code 
Unit Test 
• Developer run 
unit test at 
development 
local machine 
Hudson 
• Build / test 
software 
projects 
Yum 
Server 
• Automatic 
updates, 
package and 
dependency 
management 
20
21 
POC Hadoop Cluster 
Staging Hadoop Cluster 
Production Hadoop 
Cluster 
Yum 
Server 
Developer
22 
POC Hadoop Cluster 
Staging Hadoop Cluster 
Production Hadoop 
Cluster 
Yum 
Server 
Developer
Hadoop Cluster Profiling 
• Availability 
– Process Healthy 
– Cluster Healthy 
– System Healthy 
• Utilization 
– Cluster Usage 
– Log Analysis 
• Auditing
Nagios 
• Service healthy monitor 
• Cluster healthy monitor 
Ganglia 
• System monitor / Hadoop metrics monitor 
• Cluster resource monitor 
Splunk 
• Application /Cluster Resource Profiling 
• Auditing/Log Analysis 
24
I feel cluster HDFS 
become slow 
recently…. 
Really? From when? 
Do you have any detail 
information or log? 
Case Study 
USE 
R 
me
…………, 
Let me check on it 
Okay! 
USE 
R 
me
Sorry, we have no 
log now. 
But it is really slow. 
…………………. 
15 minutes later ….. 
USE 
R 
me
What can 
I do? 
• Check on Nagios services 
alert 
• Check Splunk Cluster HDFS 
Profiling, recently user usage 
• Check Ganglia cluster loading
• Check on Nagios services 
alert 
• Check Splunk Cluster HDFS 
Profiling, recently user usage 
• Check Ganglia cluster loading 
I can do… 
=> Finding Root 
Cause !
30 
Central 
Management 
Hadoop 
as a 
Service 
Automation 
Highly 
Availability 
Customizatio 
n
Tomorrow 
YARN 
31 
MRv2 
Spark? 
Impala? 
We choose what we really want!
Thank you! 
WE ARE HIRING! WELCOME TO JOIN TMH! 
#TrendInsight

More Related Content

What's hot

Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
 
Kafka Security
Kafka SecurityKafka Security
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
Gwen (Chen) Shapira
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
shrey mehrotra
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
 
Visualizing Kafka Security
Visualizing Kafka SecurityVisualizing Kafka Security
Visualizing Kafka Security
DataWorks Summit
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Cloudera, Inc.
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
Gwen (Chen) Shapira
 
RedisConf18 - Writing modular & encapsulated Redis code
RedisConf18 - Writing modular & encapsulated Redis codeRedisConf18 - Writing modular & encapsulated Redis code
RedisConf18 - Writing modular & encapsulated Redis code
Redis Labs
 
RedisConf18 - 2,000 Instances and Beyond
RedisConf18 - 2,000 Instances and BeyondRedisConf18 - 2,000 Instances and Beyond
RedisConf18 - 2,000 Instances and Beyond
Redis Labs
 
YARN
YARNYARN
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
Amazon Web Services
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
DataWorks Summit
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 

What's hot (20)

Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Visualizing Kafka Security
Visualizing Kafka SecurityVisualizing Kafka Security
Visualizing Kafka Security
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
RedisConf18 - Writing modular & encapsulated Redis code
RedisConf18 - Writing modular & encapsulated Redis codeRedisConf18 - Writing modular & encapsulated Redis code
RedisConf18 - Writing modular & encapsulated Redis code
 
RedisConf18 - 2,000 Instances and Beyond
RedisConf18 - 2,000 Instances and BeyondRedisConf18 - 2,000 Instances and Beyond
RedisConf18 - 2,000 Instances and Beyond
 
YARN
YARNYARN
YARN
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 

Similar to HadoopCon- Trend Micro SPN Hadoop Overview

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
DataWorks Summit
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
fann wu
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
Oscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to ProductionOscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to Production
Patrick Chanezon
 
12 Factor App Methodology
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodology
laeshin park
 
Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop
Cloudera, Inc.
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Ian Lumb
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
Agile infrastructure
Agile infrastructureAgile infrastructure
Agile infrastructure
Tarun Rajput
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
Mydbops
 
OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20
Tyrone Systems
 

Similar to HadoopCon- Trend Micro SPN Hadoop Overview (20)

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Oscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to ProductionOscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to Production
 
12 Factor App Methodology
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodology
 
Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Agile infrastructure
Agile infrastructureAgile infrastructure
Agile infrastructure
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
 
OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20
 

Recently uploaded

Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
chetankumar9855
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 

Recently uploaded (20)

Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 

HadoopCon- Trend Micro SPN Hadoop Overview

  • 1. Trend Micro SPN Hadoop Overview 張雅芳Mammi Chang @ 2014 Taiwan HadoopCon
  • 2. Who am I ? • Mammi Chang 張雅芳 • Engineer, SPN, Trend Micro • SPN Hadoop Cluster Administrator for 2 years • Developer of operation tool • Expertise : HDFS/Hbase/Pig • Experience on Mahout Recommendation System
  • 3. 3 Why Big Data in Trend Micro?
  • 4. Web Reputation 8+ billions URL process daily Technology Process Operation User Traffic / Sourcing CDN vender Rating Server for Known Threats Unknown & Prefilter Page Download Threat Analysis 8 billions/day 4.8 billions/day 40% filtered 82% filtered 860 millions/day 99.98% filtered 25,000 malicious URL /day Trend Micro Products / Technology CDN Cache High Throughput Web Service Hadoop Cluster Web Crawling Machine Learning Data Mining Block malicious URL within 15 minutes once it goes online!
  • 5. SPN Solution Architecture File Web / URL Email Domain IP File Reputation Service Email Reputation Service Customer Smart Protection Community Intelligence (Feedback loop) Web Reputation Service Sourcing Processing & Analysis Validate & Create Solution Quality Assurance Solution Distribution Solution Adoption SPN Correlation
  • 6. SPN Hadoop Use Cases Marketing Report Near real time 6 Service Researcher Data Scientist Hadoop Platform query Service Batch processing data business value information HBase HDFS
  • 7. Yesterday ~40 Hadoop nodes ~15 Service/user accounts 7 3 Teams <50 TB storage <100 Jobs per day
  • 8. Today hundreds Hadoop nodes >170 Service/user accounts >13 Teams ~1.5 PB storage >16000 Jobs per day 8
  • 9. 1 MapReduce Job 9 Submitted Each 5.4 Seconds
  • 10. 10 Central Management Hadoop as a Service Automation Highly Availability Customizatio n
  • 11. Real World Difficulties on Deployment • Hundreds of servers • Complicated Hadoop ecosystem deployment • Necessary of configuration management • Limited maintenance time 11
  • 12. Hadoop Ecosystem Puppet Hadooppet A project for deploy Trend Micro Hadoop distribution on a large cluster 12 IT automation software
  • 13. Hadooppet Workflow – Cluster Deployment 13 ………. /etc/puppet |-- auth.conf |-- fileserver.conf |-- puppet.conf `-- ssl /etc/puppet |-- auth.conf |-- autosign.conf |-- files |-- fileserver.conf |-- manifests |-- modules |-- puppet.conf `-- ssl Puppet server Yum Server Pull packages from Yum Server Auto-deploy Hadoop by role Puppet Client Auto-deploy Hadoop by role Puppet Client Auto-deploy Hadoop by role Puppet Client 1. certificate request 2. Sign certificate 3. Retrive catalog for nodes Hadoop Node Hadoop Node Hadoop Node
  • 14. Hadooppet Workfolw – Change Configuration 15 /etc/puppet |-- auth.conf |-- autosign.conf |-- files |-- fileserver.conf |-- manifests |-- modules |-- puppet.conf `-- ssl Hadoop Node Hadoop Node ………. Hadoop Node /etc/puppet |-- auth.conf |-- fileserver.conf |-- puppet.conf `-- ssl Puppet server Puppet Client Puppet Client Puppet Client conf 2. Synchronize Configuration 1. Modify configuration at server side
  • 15. CLUSTER DEPLOYMENT BY DISTRIBUTION / ENVIRONMENT • POC, Staging, Production • All-in-one VM, AWS EC2 deployment CLUSTER DEPLOYMENT • Package installation • Configuration adjustment CLUSTER OPERATION • Add new Hadoop node/client • Account management • Process management Hadooppet SANITY CHECK • DFSIO, YCSB , etc • Sample Applications 16
  • 17. Real World Difficulties on Hadoop Distribution • Too many running services to do big change • No suitable Hadoop version for Trend Micro • Always need to patch for our need 18
  • 18. Trend Micro Hadoop (TMH) • Be flexible. Pick up Business needed features • Fetch official patches in to current adopted version • Add your own patch at any time ISSUE TRACKING • Jira DEVELOPMEN T • Gitlab • Hudson TESTING • Dumbo Cluster • POC / Staging DEPLOYMENT • Hadooppet PROFILING • Nagios , Ganglia • Splunk MANAGEMEN T • Hadooppet
  • 19. TMH Development Process Jira • Tracking Issues Gitlab • Version control of source code Unit Test • Developer run unit test at development local machine Hudson • Build / test software projects Yum Server • Automatic updates, package and dependency management 20
  • 20. 21 POC Hadoop Cluster Staging Hadoop Cluster Production Hadoop Cluster Yum Server Developer
  • 21. 22 POC Hadoop Cluster Staging Hadoop Cluster Production Hadoop Cluster Yum Server Developer
  • 22. Hadoop Cluster Profiling • Availability – Process Healthy – Cluster Healthy – System Healthy • Utilization – Cluster Usage – Log Analysis • Auditing
  • 23. Nagios • Service healthy monitor • Cluster healthy monitor Ganglia • System monitor / Hadoop metrics monitor • Cluster resource monitor Splunk • Application /Cluster Resource Profiling • Auditing/Log Analysis 24
  • 24. I feel cluster HDFS become slow recently…. Really? From when? Do you have any detail information or log? Case Study USE R me
  • 25. …………, Let me check on it Okay! USE R me
  • 26. Sorry, we have no log now. But it is really slow. …………………. 15 minutes later ….. USE R me
  • 27. What can I do? • Check on Nagios services alert • Check Splunk Cluster HDFS Profiling, recently user usage • Check Ganglia cluster loading
  • 28. • Check on Nagios services alert • Check Splunk Cluster HDFS Profiling, recently user usage • Check Ganglia cluster loading I can do… => Finding Root Cause !
  • 29. 30 Central Management Hadoop as a Service Automation Highly Availability Customizatio n
  • 30. Tomorrow YARN 31 MRv2 Spark? Impala? We choose what we really want!
  • 31. Thank you! WE ARE HIRING! WELCOME TO JOIN TMH! #TrendInsight