Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Facing Enterprise-specific Challenges –
Utility Programming in Hadoop
吳育儒 Fann Wu
Who am I ?
• Fann Wu 吳育儒
• Sr. Engineer, SPN, Trend Micro
• Hadoop Cluster Admin
• Splunk Cluster Admin
• Monitor Admin
• ...
Agenda
• How to manage big cluster
• How to manage big Hadoop cluster
• Datacenter & AWS
How to manage big cluster
武功
• 基本心法
• 基本招式
心法
招式
無招勝有招
心法
• 將幾百台Server當做一台Server在管
• 將幾百台Server當做女朋友照顧
• Server安穩才能睡個好覺
• 放乖乖之必要
Cluster
http://www.quuxlabs.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
招式-shellscript
實用的土炮!!!
招式-PSSH
PSSH provides parallel versions of OpenSSH and
related tools. Included are pssh, pscp, prsync,
pnuke, and pslurp.
...
招式-SaltStack
http://saltstack.com/
招式-SaltStack
• Role:
- Salt Master
• Salt Minions
• Web UI
招式-SaltStack
• Install SaltStack from epel (CentOS 6)
招式-SaltStack
• Config SaltStack and start SaltStack service
• Master:
• Add “ interface: 192.168.50.8” to /etc/salt/master...
招式-SaltStack
• List Unaccepted key, Accept all key
招式-SaltStack
• You can use salt command to control the cluster
招式-SaltStack UI HALITE
Configure Management
Chef
Ansible
Puppet
Puppet Web - Foreman
Wait ………….
Where is Hadoop
How to manage Big Hadoop cluster
TrendMicro Hadoop
• Server: hundreds of Servers
• User : 2 hundred accounts
• Daily Input Data: 2TB
• Daily Jobs: hundred ...
27
Hadoop
as a
Service
Central
Management
Automation
Highly
Availability
Customization
We need….
Hadoop
Ecosystem
Puppet
Hadooppet
A project for deploy
Trend Micro Hadoop
distribution on a large
cluster
28
IT automation...
CLUSTER DEPLOYMENT BY
DISTRIBUTION / ENVIRONMENT
• POC, Staging, Production
• All-in-one VM, AWS EC2 deployment
CLUSTER DE...
Hadoop Security
Hadoop Security - Without security
• From any machine that can access hadoop
• [root@hackserver opt]# su hdfs
• [hdfs@hack...
Hadoop Security - Kerberos
Hadoop Security - Kerberos
• Without auth
• Pass auth
Kerberos Common Problem
• Problem:Clock skew too great while getting initial
credentials
• Solution:Use date or ntpdate to...
Hadoop Security – Folder Permission
• POSIX permissions
• POSIX ACLs (Access Control Lists)
Hadoop Security- More Security but still in incubation
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Knox_Gat...
Hadoop HA
Sleep well - HA
• Kerberos HA
• LDAP HA
• Namenode HA
• Jobtracker HA/Resourcemanger HA
• HBase HA
Mertric/Monitor tool
Without Mertric/Monitor tool
Know the present/future-Ganglia
Known the real problem-Nagios
Known the real problem-Nagios mail
Find the detail log and generate fashion report-
Splunk
Job collector
User Mapper/Reducer usage
Fail Job Summary
Cluster Mapper Usage/Pending
Cluster Reducer Usage/Pending
Other Tools
Offline Image Viewer
• Transform fsimage from binary to text
• Cat fsimage.txt
Total Number of Files for Each User
Pig
Common Datanode Decommission Process
• Add DataNodes hostnames to dfs.exclude file
• On NameNode host, run hdfs dfsadmin –...
TrendMicro Datanode Decommission(HDD hot swap)
• Replace the crash HDD
• Stop datanode & umount the broken mount point.
(5...
HBase Canary Tool
• Contributor: TrendMicro Scott Miao
• Purpose: Check every table’s first region on
regionserver
https:/...
HBase Canary Tool
• Usage:
• Result
HappyBase+Thrift
• What is HappyBase
• Purpose:
- Check regionserver’s every region response time
- Check table’s every re...
Datacenter & Aws
How we test EMR POC
If your EMR cluster running 24x7
Reduce EMR cost
• 100 nodes cost running 1 hour == 1 node running 100 hour
• AWS charge by hour
• If you don’t care about ...
Datacenter Cost by Service(Storage)
• Application Size / ((Server HDD space * 0.75)* Server Cost/2
Datacenter Cost by Service(Computing)
• ((Used Map slot + Used Reduce Slot)/(total Map slot + total Reduce
slot))* total s...
#TrendInsigh
Thank you!
WE ARE HIRING! WELCOME TO JOIN TRENDMICRO!
Upcoming SlideShare
Loading in …5
×

Facing enterprise specific challenges – utility programming in hadoop

537 views

Published on

HadoopCon 2015

Published in: Technology
  • Be the first to comment

Facing enterprise specific challenges – utility programming in hadoop

  1. 1. Facing Enterprise-specific Challenges – Utility Programming in Hadoop 吳育儒 Fann Wu
  2. 2. Who am I ? • Fann Wu 吳育儒 • Sr. Engineer, SPN, Trend Micro • Hadoop Cluster Admin • Splunk Cluster Admin • Monitor Admin • 水電工 Architecture,Operation,TroubleShooting,Automation, Performance Turning
  3. 3. Agenda • How to manage big cluster • How to manage big Hadoop cluster • Datacenter & AWS
  4. 4. How to manage big cluster
  5. 5. 武功 • 基本心法 • 基本招式
  6. 6. 心法
  7. 7. 招式 無招勝有招
  8. 8. 心法 • 將幾百台Server當做一台Server在管 • 將幾百台Server當做女朋友照顧 • Server安穩才能睡個好覺 • 放乖乖之必要
  9. 9. Cluster http://www.quuxlabs.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
  10. 10. 招式-shellscript 實用的土炮!!!
  11. 11. 招式-PSSH PSSH provides parallel versions of OpenSSH and related tools. Included are pssh, pscp, prsync, pnuke, and pslurp. https://code.google.com/p/parallel-ssh/
  12. 12. 招式-SaltStack http://saltstack.com/
  13. 13. 招式-SaltStack • Role: - Salt Master • Salt Minions • Web UI
  14. 14. 招式-SaltStack • Install SaltStack from epel (CentOS 6)
  15. 15. 招式-SaltStack • Config SaltStack and start SaltStack service • Master: • Add “ interface: 192.168.50.8” to /etc/salt/master • Minion: • Add “master:192.168.50.8” to /etc/salt/minion
  16. 16. 招式-SaltStack • List Unaccepted key, Accept all key
  17. 17. 招式-SaltStack • You can use salt command to control the cluster
  18. 18. 招式-SaltStack UI HALITE
  19. 19. Configure Management
  20. 20. Chef
  21. 21. Ansible
  22. 22. Puppet
  23. 23. Puppet Web - Foreman
  24. 24. Wait …………. Where is Hadoop
  25. 25. How to manage Big Hadoop cluster
  26. 26. TrendMicro Hadoop • Server: hundreds of Servers • User : 2 hundred accounts • Daily Input Data: 2TB • Daily Jobs: hundred of jobs
  27. 27. 27 Hadoop as a Service Central Management Automation Highly Availability Customization We need….
  28. 28. Hadoop Ecosystem Puppet Hadooppet A project for deploy Trend Micro Hadoop distribution on a large cluster 28 IT automation software So…..
  29. 29. CLUSTER DEPLOYMENT BY DISTRIBUTION / ENVIRONMENT • POC, Staging, Production • All-in-one VM, AWS EC2 deployment CLUSTER DEPLOYMENT • Package installation • Configuration adjustment CLUSTER OPERATION • Add new Hadoop node/client • Account management • Process management SANITY CHECK • DFSIO, YCSB , etc • Sample Applications Hadooppet 29 WE CAN EASILY DEPLOY HUNDREDS OF SERVERS WITHIN ONE HOUR
  30. 30. Hadoop Security
  31. 31. Hadoop Security - Without security • From any machine that can access hadoop • [root@hackserver opt]# su hdfs • [hdfs@hackserver opt]$ hadoop fs -rmr / • Say Goodbye to your data 如有雷同純屬巧合
  32. 32. Hadoop Security - Kerberos
  33. 33. Hadoop Security - Kerberos • Without auth • Pass auth
  34. 34. Kerberos Common Problem • Problem:Clock skew too great while getting initial credentials • Solution:Use date or ntpdate to sync the time
  35. 35. Hadoop Security – Folder Permission • POSIX permissions • POSIX ACLs (Access Control Lists)
  36. 36. Hadoop Security- More Security but still in incubation http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Knox_Gateway_Admin_Guide/content/ch01.html
  37. 37. Hadoop HA
  38. 38. Sleep well - HA • Kerberos HA • LDAP HA • Namenode HA • Jobtracker HA/Resourcemanger HA • HBase HA
  39. 39. Mertric/Monitor tool
  40. 40. Without Mertric/Monitor tool
  41. 41. Know the present/future-Ganglia
  42. 42. Known the real problem-Nagios
  43. 43. Known the real problem-Nagios mail
  44. 44. Find the detail log and generate fashion report- Splunk
  45. 45. Job collector
  46. 46. User Mapper/Reducer usage
  47. 47. Fail Job Summary
  48. 48. Cluster Mapper Usage/Pending
  49. 49. Cluster Reducer Usage/Pending
  50. 50. Other Tools
  51. 51. Offline Image Viewer • Transform fsimage from binary to text • Cat fsimage.txt
  52. 52. Total Number of Files for Each User Pig
  53. 53. Common Datanode Decommission Process • Add DataNodes hostnames to dfs.exclude file • On NameNode host, run hdfs dfsadmin –refreshNodes • Check Web UI to see whether the state has changed to Decommission In Progress for the DataNodes being decommissioned. (1day~2day) • When all the DataNodes report their state as Decommissioned, You can then shut down the decommissioned nodes. • Replace the crash HDD, reboot server and re-config the HDD from Raid card. (20 mins) • Mount the HDD • Start Datanode service http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_system-admin-guide/content/admin_decommission-slave-nodes-2-1.html
  54. 54. TrendMicro Datanode Decommission(HDD hot swap) • Replace the crash HDD • Stop datanode & umount the broken mount point. (5 mins) • Reinit the HDD from raid card setting • Check /var/log/message , linux will auto rescan • Mount the broken point • Start datanode service
  55. 55. HBase Canary Tool • Contributor: TrendMicro Scott Miao • Purpose: Check every table’s first region on regionserver https://issues.apache.org/jira/browse/HBASE-7525
  56. 56. HBase Canary Tool • Usage: • Result
  57. 57. HappyBase+Thrift • What is HappyBase • Purpose: - Check regionserver’s every region response time - Check table’s every region response time http://happybase.readthedocs.org/en/latest/
  58. 58. Datacenter & Aws
  59. 59. How we test EMR POC
  60. 60. If your EMR cluster running 24x7
  61. 61. Reduce EMR cost • 100 nodes cost running 1 hour == 1 node running 100 hour • AWS charge by hour • If you don’t care about job stable, use spot instance to save cost • Use Reserve Instance to save cost • Use EMR Auto Scaling • Pilot run your Application to estimate how many machines and size • Get your monthly cost from aws caculator • http://calculator.s3.amazonaws.com/index.html
  62. 62. Datacenter Cost by Service(Storage) • Application Size / ((Server HDD space * 0.75)* Server Cost/2
  63. 63. Datacenter Cost by Service(Computing) • ((Used Map slot + Used Reduce Slot)/(total Map slot + total Reduce slot))* total server cost/2
  64. 64. #TrendInsigh Thank you! WE ARE HIRING! WELCOME TO JOIN TRENDMICRO!

×