CloudStack Best Practice in PPTV


Published on

PPTV is using CloudStack 3.0.2 in its production environment. Currently there are more than 150 hosts, and migrate their apps to cloud everyday (10 host per day). At the end of 2013, there will be more than 1000 hosts in a CloudStack environment.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

CloudStack Best Practice in PPTV

  1. 1. C l o u d Sta c k B e s t P r a c t i ce s I nPPTVDeanWei
  2. 2. About Me OPS Architect at PPTV• 3 years experience in software development and design• 6 years experience in technical consultant(infrastructure architecture design , integration , solution , capacity planning and performance tuning) for the top insurance companies (AIG,ASR,ACE,Fortis,SNS REAAL,Chubb,GEL,SBI)• 1 year experience in ASP(Application Service Provider) platform architecture design,security, performance analysis and optimization ,and operations• Current focus on the automation operations architecture, cloud platform building, the large-scale distributed system operations and performance analysis and optimization ,continuous delivery, System performance tuning SINA WEIBO (DeanWei) :
  3. 3. AgendaWhy Cloud? What is Cloudstack? How to Build?
  4. 4. Overview Why Use Cloud ? Why Cloudstack ? What is CloudStack ? How to build A Cloud-Based Infrastructure Platform? Cloudstack Best Practices In PPTV  Deployment Architecture  Network Considerations And Design  Storage Considerations And Design  Services Offering Considerations And Design  Troubleshooting Best Practices  Performance Tuning
  5. 5. Background And Challenge
  6. 6. The Original Infrastructure Provisioning Processes APP OPS 申 IDC 查找 IDC 初始化 OS IDC 安装VM IDC 创建VM 请资源 CMDB 软件 监控Team更新 APP OPS 更新 App OPS 安 App OPS 安装 App OPS 初始 VM Zabbix 监控 CMDB 装应用 中间件 Tools 调整 更改控制审批 迁移到环境 重新布线,迁移到 release 配置 应用上线 产品环境
  7. 7. Problems A. Occupied by a large number of people B. A large number of manual steps C. Built one server at a time D. Non-Self Service E. Not out of the box by itself F. Non-elastic G. Path dependence H. Long time for building I. Many fault point
  8. 8. Five Characteristics of Clouds A. On-Demand Self-Service B. Scalable C. Resource Pooling D. Rapid Elasticity E. Measured Service Cloud technology can solve our current confusion!
  9. 9. Cloud-based Infrastructure Provisioning Processes Provisioned when needed App OPS 申请应用 OPS 访问 OPS 挑选应用最 选择可用资源 环境 Services UI 近快照模板 (验证资源分配) (选择应用模板和资源规模) (可用的资源和何时使用)o Out of the box 资源自动分配和 按 “启动” Parallel building ERP CRM appo 注册 APPo Self Service App1o One-button for All APP2o Elastic (资源分配,自动创建VM,监控注册等)
  10. 10. Cloud Still Requires Architectural Design Cloud Computing isn’t a magical solution apps need to be able to scale out Design your architecture with the end in mind Make your infrastructure easily replicable
  11. 11. Popular Cloud Software Platform
  12. 12. Why CloudStack? Open Source: Apache 2.0 Cloudstack User(it is proven, and has a good track record) It is very easy to install and get up and running Less man hours for implementation Easy to integration and custom Match our requirements at this stage
  13. 13. What is CloudStack? Open source Infrastructure as a Service (IaaS) solution. Programmable Data Center orchestrator Hypervisor agnostic Support scalable storage (Ceph, SWIF,NFS) Support complex enterprise networking (e.g Firewall, load balancer, VPN, VPC…) Multi-tenant
  14. 14. Core Components Hosts o Servers onto which services will be provisioned VM Primary Storage Host o VM disk storage VM Network Cluster Host o A grouping of hosts and their associated storage Primary Pod Storage o Collection of clusters in the same failure boundary Cluster Network Secondary o Logical network associated with service Storage Cluster offerings Secondary Storage CloudStack Pod o Template, snapshot and ISO storage Zone o Collection of pods, network offerings and CloudStack Pod secondary storage Management Server Farm Zone o Management and provisioning tasks
  15. 15. Two Types of StoragePrimary Storage• Stores disk volumes for VMs in a cluster• Configured at Cluster-level.• Close to hosts for better performance L3 switch• Cluster have at least one primary storage• Requires high IOPs (can be expensive) Pod 1 L2 switch Secondary Cluster 1 Storage Host 1 PrimarySecondary Storage Host 2 Storage• Stores all Templates, ISOs and Snapshots• Configured at Zone-level• Zone can have one or more secondary storages• High capacity, low cost commodity storage
  16. 16. Deployment Architecture Internet  Hypervisor is the basic unitManagementServer Cluster of scale. Zone 1  Cluster consists of one ore more hosts of same L3 hypervisor Pod N  All hosts in cluster have Pod 1 L2 Secondary access to shared (primary) …. Storage storage Cluster N  Pod is one or more clusters, usually with L2 switches. ….  Availability Zone has one or Cluster 1 more pods, has access to Host 1 secondary storage. Primary  One or more zones Host 2 Storage represent cloud
  17. 17. Software Architecture Cloud Other UI CLI Clients Portal Management Server REST API OAM&P API End User API EC2 API Other APIs Pluggable Service API Engine Console Proxy ACL & Authentication Security Adapters Management - Accounts, Domains, and Projects - ACL, limits checking Account Management Template Connectors Access Services API DB Plugin API Deployment Planning HA Orchestration Engine - Drives long running VM Services API Network Gurus Usage operations Calculations - Syncs between resources managed and DB Network Elements Additional - Generates events Services Hypervisor Gurus Cluster Resource Job Alert & Event Database Management Management Management Management Access Message Bus Event Bus Usage Server Resource API Hypervisor Network Storage Image Snapshot Resources Resources Resources Resources Resources
  18. 18. Data And Control Flow Cloud  Management Servers control all resources,Data Center 1 Data Center 3 both virtual and physical Managem VR ent Server VR  SSVMs deployed to transfer data between CPVM SSVM SSVM CPVM zones Transfer of Templates,  CPVMs deployed to ISOs, Snapshots transfer VNC console Internet traffic Data Center 2  VR deployed for traffic VR SSVM into public internet CPVM  Management Server is never in the data path
  19. 19. How to build A Cloud-based infrastructure Platform? A infrastructure Management Platform constitutes:  Provisioning  Configuration Management  Services Orchestration  Monitoring And Alert How to build ?  Architecture  A programmable infrastructure architecture  Open Source ToolChains
  20. 20. A infrastructure Management Platform constitutes Provisioning  Installation of operating systems and other software Configuration Management  Sets the parameters for servers, can specify initialized parameters Services Orchestration  Automate tasks across systems Monitoring And Alert  Records errors and health of infrastructure  Alert Services
  21. 21. A Programmable Infrastructure Architecture
  22. 22. Open Source Provisioning Tools Year Started License Installation TargetsKickstart ? GPL Most .dep and RPM based Linux distrosCobbler (Plus koan 2007 GPL Red Hat, OpenSUSEfor PXE boot of Fedora, Debian,VMs) UbuntuSpacewalk 2008 GPL Fedora, CentosCrowbar 2011 Apache (Bare metal provisioning)
  23. 23. Open Source Configuration Management Tools Year Language License Client/Server StartedCfengine 1993 C Apache YesChef 2009 Ruby Apache Chef Solo – No Chef Server - YesPuppet 2004 Ruby GPL yesSalt 2011 Python Apache yes
  24. 24. Open Source Monitoring Tools License Type of Collection Monitoring MethodsCacti / GPL Performance SNMP, syslogRRDToolNagios GPL Availability SNMP,TCP, ICMP, IPMI, syslogZabbix GPL Availability/ SNMP, Performance and TCP/ICMP, more IPMI, Synthetic TransactionsZenoss GPL Availability, SNMP, ICMP, Performance, SSH, syslog, Event WMI Management
  25. 25. Open Source Automation/Orchestration Tools Year Languag Licens Client/Se Support Started e e rver Organizati onCapistrano 2006 Ruby MIT Yes NoneControltier 2010 Java Apache Yes DTO/RunDeck SolutionsFunc 2007 Python GPL Yes Fedora ProjectMCollective 2009 Ruby Apache Yes PuppetLabsSalt 2011 Python Apache Yes SaltStack Inc. ?
  26. 26. Provisioning Activity Flow And Open Source Tools ControlTier Services Portal Command and Application Services Control Orchestration And ManagementProvisioning Activity Zabbix Puppet Configuration System Configuration Cloudstack Cobbler VM Image Bootstrapping OS Install Launch
  27. 27. Automated Tools Chain in PPTVGenerate BootStrapped Provision Configuratio Images Image Cobbler/Cloud n Cobbler/CloudStack stack/Koan PuppetBoxGrinder Monitoring Services zabbix Orchestration Cacti ControlTier/Zabbix agent CMDB CMDBUILD/Ra ckTable
  28. 28. Cloudstack In PPTV CS Version : 3.0.2 Hypervisor : KVM Host OS : Centos 6.2 KVM Guest OS : Centos 5.8 Multiple management servers are deployed in the multi-line/BGP IDC Be deployed to all the core IDC and Used for the Non-vod business More than 150 hosts Primary storage : local Storage Secondary Storage : Local NFS Server and GlusterFS Network : Basic Network Monitoring : Zabbix System configuration management : Puppet Services Orchestration management : ControlTier/Services Portal Patches for the performance, integration and stability Workaround for some issues
  29. 29. Deployment Architecture BGP/Multi-line Management Farm BGP IDC 沈阳电信 IDC 上海电信 IDC Manage ment Server SYCB Zone BGP Zone SHTB Zone 广州电信 IDC 成都电信 IDC 北京网通 IDC GZTB Zone CDTB Zone BJCB Zone
  30. 30. Management Server Deployment Architecture MySQL ManagementUser API Server1 Load Balancer ReplicationAdmin API Management Server2 Slave Infrastructure Infrastructure Infrastructure Resources Resources Resources zone1 Zone2 Zone3
  31. 31. Network Considerations And Design  Using Basic Network  Custom Network offering for basic network(Only use DHCP)  Disable Iptables for performance consideration(modify Sources Code)  Disable Security Group  Multi-zone design for PrimaryStorage Performance consideration
  32. 32. Storage Considerations And Design  Use Local Storage  A cluster mapping to a Host  Primary Storage  A local disk only services a VM instance L3 switch  Backup VM instance as template on schedule  Using shared storage type Pod 1 L2 switch  Separating application data and log Secondary data to Root Volume and Data Volume Cluster 1 Storage  Secondary Storage  Local NFS Server Host 1 Primary  Backup Data use Inotify and Rsync Storage  Network Card bonding  Up-link to 10G  Failover By manual  GlusterFS over NFS
  33. 33. Services Offering Considerations And Design  Disable HA  A disk offering bind the specified disk  A compute offering bind the specified host and disk
  34. 34. Provisioning Processes Best PracticesA. Install Host OS by cobberB. Install CS agent and system settings by puppetC. Install and configure monitor by puppetD. Services Orchestration system trigger scripts to register host to CSE. Services Orchestration system trigger script to generate Disk offerings and Compute offerings for HostF. Services Orchestration system register host to CMDBG. Host go launch
  35. 35. Troubleshooting Best Practices  Analyse Log files  Management Log : /var/log/cloud/management/  Agent Log : /var/log/cloud/agent/  Adjust log4j level for debugging  Source Code  Data Models
  36. 36. Performance Tuning  BIOS Settings for KVM Host For Dell PowerEdge servers: A. Set the Power Management Mode to Maximum Performance. B. Set the CPU Power and Performance Management Mode to Maximum Performance. C. Processor Settings: set Turbo Mode to enabled . D. Processor Settings: set C States to disabled.
  37. 37. Performance Tuning (contd)  CS Tuning  NFS Server Tuning  Use NFSV4  noatime,nodiratime,noacl,data=writeback,commit=15  IDE/Sata parameters  NIC &TCP/IP  Use GlusterFS  Management Server Tuning  Increase Worker Process Number  Turn off stats collectors  Tuning Allocation Algorithm  Tuning Direct Agent Load Size  Mysql DB tuning  JVM Tuning  Heap Size Tuning  Use CMS GC Algorithm
  38. 38. Performance Tuning (contd) KVM Tuning  CPU  Disable KSM in KVM Host  Disable tickless mode in KVM guest  PIN CPU in KVM host  Memory  THP in KVM Host  echo yes > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag  echo always> /sys/kernel/mm/redhat_transparent_hugepage/enabled  echo never> /sys/kernel/mm/redhat_transparent_hugepage/defrag  network performance issue in centos 6.2  Workaround: blacklist vhost-net. Edit /etc/modprobe.d/blacklist-kvm.conf and include vhost-net. Linux kernel parameters tuning  TCP Buffer Tuning
  39. 39. Q&A