Your SlideShare is downloading. ×
0
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench

3,226

Published on

Pivotal has setup and operationalized 1000 node Hadoop cluster called the Analytics Workbench. It takes special setup and skills to manage such a large deployment. This session shares how we set it up …

Pivotal has setup and operationalized 1000 node Hadoop cluster called the Analytics Workbench. It takes special setup and skills to manage such a large deployment. This session shares how we set it up and how you will manage it.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,226
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 1© Copyright 2013 EMC Corporation. All rights reserved. Operationalizing 1000 Node Hadoop Cluster – Analytics Workbench Clinton Ooi Bhavin Modi
  • 2. 2© Copyright 2013 EMC Corporation. All rights reserved. Agenda  Introduction  Tools – Kickstart – Parallel SSH – Puppet  Q & A
  • 3. 3© Copyright 2013 EMC Corporation. All rights reserved. Meet AWB Introduction to the Analytics Workbench
  • 4. 4© Copyright 2013 EMC Corporation. All rights reserved. Vision Statement Provide a collaborative platform that is: AGILE: Support platform for proving mixed mode enterprise readiness at scale. INNOVATIVE: Showcase ground breaking data science. ACCESSIBLE: Create a shared environment for rapid innovation of big data and cloud computing technologies. EDUCATIONAL: Provide a resource for educating developers, partners, and customers on big data and cloud technologies.
  • 5. 5© Copyright 2013 EMC Corporation. All rights reserved. Partners Intel– contributed 2,000 hex-core CPUs Mellanox – contributed 72 switches, 1000+ network cards, 1400+ cables Micron – contributed 6,000 memory modules Seagate – contributed 12,000 2TB drives Supermicro – contributed 1,000+ servers Switch – contributed the hosting facility in its state-of-the-art data center VMware – provided operational support
  • 6. 6© Copyright 2013 EMC Corporation. All rights reserved. Quick facts  Largest Hadoop cluster of its kind  Operational since July 2012  Single multi-tenant cluster  Physical cluster (no virtualization)  25 projects - 12 active, 8 in pipeline
  • 7. 7© Copyright 2013 EMC Corporation. All rights reserved. Use-case  Pivotal Demonstration  Partner Engagements  Industry and Academia Collaboration
  • 8. 8© Copyright 2013 EMC Corporation. All rights reserved. Tools Scalable Tool Chain & Standardization
  • 9. 9© Copyright 2013 EMC Corporation. All rights reserved. AWB Cluster Lifecycle
  • 10. 10© Copyright 2013 EMC Corporation. All rights reserved. AWB Cluster Lifecycle
  • 11. 11© Copyright 2013 EMC Corporation. All rights reserved. Kickstart  Generic tool to automate OS install  Requires DHCP, TFTP and HTTP services  TFTP serves the PXELINUX HEX file, Linux kernel (vmlinuz) and in-memory file system (initrd)  HTTP serves the kickstart configuration (kickstart.cfg)
  • 12. 12© Copyright 2013 EMC Corporation. All rights reserved. Kickstart  Example of PXELINUX file - /tftpboot/pxelinux.cfg/AC1C0401 Continued default install label install kernel centos/6.2/vmlinuz append initrd=centos/6.2/initrd.img ramdisk_size=9025 text console=ttyS2,115200,n,1 sshd=1 install=http://10.1.25.51/centos/6.2/os/x86_64 ks=http://10.1.25.51/centos/6.2/kickstart/conf/kickstart.cfg implicit 1 display message prompt 1 timeout 10
  • 13. 13© Copyright 2013 EMC Corporation. All rights reserved. Kickstart  Example of kickstart config Continued … url --url http://10.1.25.51/centos/6.2/os/x86_64 ... %packages @core @performance … %post --log=/root/kickstart-post.log wget -O /root/post-install.tgz http://10.1.25.51/centos/6.2/post-install.tgz …
  • 14. 14© Copyright 2013 EMC Corporation. All rights reserved. Kickstart  Generate PXELINUX and kickstart files Continued [cooi@ks ~]$ ./kickstart --generate --os centos --osver 6.2 --restart pxe node0945 Generating /tftpboot/pxelinux.cfg/AC1C0401 Setting bootdev on node0945.sp Set Boot Device to pxe Restarting node0945.sp Chassis Power Control: Cycle [cooi@ks ~]$ for i in `seq -w 1 200`; do ./kickstart --generate --os centos --osver 6.2 --restart pxe node0$i; done … Skipping
  • 15. 15© Copyright 2013 EMC Corporation. All rights reserved. Kickstart  Enable switching or upgrading OS easily  Kickstart 60 nodes in ~45 minutes: – 1 kickstart server with software RAID5 – 100Mbps TOR and aggregator switches – Saturated the 100Mbps network  Kickstart 200 nodes in ~45 minutes: – 2 kickstart servers with software RAID5 – 100Mbps TOR switches and 1Gbps aggregator switches  Estimate to do >1000 nodes with full 1Gbps network Continued
  • 16. 16© Copyright 2013 EMC Corporation. All rights reserved. Parallel SSH  Sys admin’s lightsaber
  • 17. 17© Copyright 2013 EMC Corporation. All rights reserved. Parallel SSH Continued  Start/Stop Hadoop services  Orchestrate cluster deployments  Perform manual cluster administration tasks  Pick one that is user-friendly and scalable, e.g. – Massh - http://m.a.tt/er/massh/ – ClusterShell - https://github.com/cea-hpc/clustershell – Parallel Distributed Shell (pdsh) - https://code.google.com/p/pdsh
  • 18. 18© Copyright 2013 EMC Corporation. All rights reserved. Puppet  Configuration Management framework  Install and configure all applications on the cluster  Configure monitoring system  Currently running Puppet 2.7.x
  • 19. 19© Copyright 2013 EMC Corporation. All rights reserved. Puppet Continued
  • 20. 20© Copyright 2013 EMC Corporation. All rights reserved. Puppet Continued
  • 21. 21© Copyright 2013 EMC Corporation. All rights reserved. Puppet Continued
  • 22. 22© Copyright 2013 EMC Corporation. All rights reserved. Puppet Continued
  • 23. 23© Copyright 2013 EMC Corporation. All rights reserved. Puppet Continued  Puppet sync 600 nodes in ~15 minutes: – Use parallel SSH tool to trigger Puppet sync across the cluster – 1 Puppet master with dual hex-core CPU – Saturated CPU on the Puppet master  Switch versions of Hadoop in 2 hours  Manifests and modules are version-controlled
  • 24. 24© Copyright 2013 EMC Corporation. All rights reserved. Puppet Continued  One quarter to learn, deploy and design our Puppet infrastructure. – It is an iterative process.  Tasks managed outside of Puppet: – User account management – Start/Stop Hadoop services – Orchestrate deployment – Rollback/uninstall applications
  • 25. 25© Copyright 2013 EMC Corporation. All rights reserved. Cluster Management Tools Task / Tools Kickstart Parallel SSH Puppet Nagios Ganglia Install OS Install Apps Configure Apps Start / Stop Services Monitoring
  • 26. 26© Copyright 2013 EMC Corporation. All rights reserved. Q & A http://www.analyticsworkbench.com
  • 27. 27© Copyright 2013 EMC Corporation. All rights reserved. Pivotal Sessions at EMC World Session Presenter Dates/Times The Pivotal Platform: A Purpose-Built Platform for Big-Data- Driven Applications Josh Klahr Tue 5:30 - 6:30, Palazzo E Wed 11:30 - 12:30, Delfino 4005 Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action Noelle Sio Tue 10:00 - 11:00, Lando 4205 Thu 8:30 - 9:30, Palazzo F Pivotal: Operationalizing 1000-node Hadoop Cluster – Analytics Workbench Clinton Ooi Bhavin Modi Tue 11:30 - 12:30, Palazzo L Thu 10:00- 11:00 am, Delfino 4001A Pivotal: for Powerful Processing of Unstructured Data For Valuable Insights SK Krishnamurthy Mon 4:00 - 5:00, Lando 4201 A Tue 4:00 - 5:00, Palazzo M Pivotal: Big & Fast data – merging real-time data and deep analytics Michael Crutcher Mon 1:00 - 2:00, Lando 4201 A Wed 10:00 - 11:00, Palazzo M Pivotal: Virtualize Big Data to Make The Elephant Dance June Yang Dan Baskette Mon 11:30 - 12:30, Marcello 4401A Wed 4:00 - 5:00, Palazzo E Hadoop Design Patterns Don Miner Mon 2:30 - 3:30, Palazzo F Wed 8:30 - 9:30, Delfino 4005

×