PPTV is using CloudStack 3.0.2 in its production environment. Currently there are more than 150 hosts, and migrate their apps to cloud everyday (10 host per day). At the end of 2013, there will be more than 1000 hosts in a CloudStack environment.
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
CloudStack Best Practice in PPTV
1. C l o u d Sta c k B e s t P r a c t i ce s I n
PPTV
DeanWei
2. About Me
OPS Architect at PPTV
• 3 years experience in software development and design
• 6 years experience in technical consultant(infrastructure
architecture design , integration , solution , capacity
planning and performance tuning) for the top insurance
companies (AIG,ASR,ACE,Fortis,SNS REAAL,Chubb,GEL,SBI)
• 1 year experience in ASP(Application Service Provider)
platform architecture design,security, performance
analysis and optimization ,and operations
• Current focus on the automation operations architecture,
cloud platform building, the large-scale distributed system
operations and performance analysis and
optimization ,continuous delivery, System performance
tuning
SINA WEIBO (DeanWei) : http://weibo.com/deanw
4. Overview
Why Use Cloud ?
Why Cloudstack ?
What is CloudStack ?
How to build A Cloud-Based Infrastructure Platform?
Cloudstack Best Practices In PPTV
Deployment Architecture
Network Considerations And Design
Storage Considerations And Design
Services Offering Considerations And Design
Troubleshooting Best Practices
Performance Tuning
7. Problems
A. Occupied by a large number of people
B. A large number of manual steps
C. Built one server at a time
D. Non-Self Service
E. Not out of the box by itself
F. Non-elastic
G. Path dependence
H. Long time for building
I. Many fault point
8. Five Characteristics of Clouds
A. On-Demand Self-Service
B. Scalable
C. Resource Pooling
D. Rapid Elasticity
E. Measured Service
Cloud technology can solve our current confusion!
9. Cloud-based Infrastructure Provisioning Processes
Provisioned when needed
App OPS 申请应用 OPS 访问 OPS 挑选应用最 选择可用资源
环境 Services UI 近快照模板
(验证资源分配) (选择应用模板和资源规模) (可用的资源和何时使用)
o Out of the box
资源自动分配和 按 “启动”
Parallel building
ERP CRM app
o 注册
APP
o Self Service App1
o One-button for All
APP2
o Elastic
(资源分配,自动创建VM,监控注册等)
10. Cloud Still Requires Architectural Design
Cloud Computing isn’t a magical solution apps need to be
able to scale out
Design your architecture with the end in mind
Make your infrastructure easily replicable
12. Why CloudStack?
Open Source: Apache 2.0
Cloudstack User(it is proven, and has a good track record)
It is very easy to install and get up and running
Less man hours for implementation
Easy to integration and custom
Match our requirements at this stage
13. What is CloudStack?
Open source Infrastructure as a Service (IaaS) solution.
Programmable Data Center orchestrator
Hypervisor agnostic
Support scalable storage (Ceph, SWIF,NFS)
Support complex enterprise networking (e.g Firewall, load balancer, VPN,
VPC…)
Multi-tenant
14. Core Components
Hosts
o Servers onto which services will be
provisioned
VM
Primary Storage Host
o VM disk storage
VM
Network
Cluster Host
o A grouping of hosts and their associated
storage Primary
Pod Storage
o Collection of clusters in the same failure
boundary Cluster
Network
Secondary
o Logical network associated with service Storage Cluster
offerings
Secondary Storage
CloudStack Pod
o Template, snapshot and ISO storage
Zone
o Collection of pods, network offerings and CloudStack Pod
secondary storage
Management Server Farm Zone
o Management and provisioning tasks
15. Two Types of Storage
Primary Storage
• Stores disk volumes for VMs in a cluster
• Configured at Cluster-level.
• Close to hosts for better performance L3 switch
• Cluster have at least one primary storage
• Requires high IOPs (can be expensive)
Pod 1 L2 switch
Secondary
Cluster 1 Storage
Host 1
Primary
Secondary Storage Host 2 Storage
• Stores all Templates, ISOs and Snapshots
• Configured at Zone-level
• Zone can have one or more secondary
storages
• High capacity, low cost commodity
storage
16. Deployment Architecture
Internet Hypervisor is the basic unit
Management
Server Cluster of scale.
Zone 1 Cluster consists of one ore
more hosts of same
L3 hypervisor
Pod N
All hosts in cluster have
Pod 1 L2 Secondary
access to shared (primary)
…. Storage
storage
Cluster N
Pod is one or more clusters,
usually with L2 switches.
….
Availability Zone has one or
Cluster 1 more pods, has access to
Host 1
secondary storage.
Primary
One or more zones
Host 2 Storage
represent cloud
17. Software Architecture
Cloud Other
UI CLI Clients
Portal
Management Server
REST API
OAM&P API End User API EC2 API Other APIs Pluggable Service API Engine
Console Proxy ACL & Authentication Security Adapters
Management - Accounts, Domains, and Projects
- ACL, limits checking Account Management
Template Connectors
Access
Services API
DB
Plugin API
Deployment Planning
HA
Orchestration Engine
- Drives long running VM
Services API
Network Gurus
Usage operations
Calculations - Syncs between resources
managed and DB Network Elements
Additional - Generates events
Services
Hypervisor Gurus
Cluster Resource Job Alert & Event Database
Management Management Management Management Access
Message Bus
Event Bus Usage
Server
Resource API
Hypervisor Network Storage Image Snapshot
Resources Resources Resources Resources Resources
18. Data And Control Flow
Cloud Management Servers
control all resources,
Data Center 1
Data Center 3 both virtual and physical
Managem
VR
ent
Server
VR SSVMs deployed to
transfer data between
CPVM SSVM SSVM CPVM zones
Transfer of
Templates, CPVMs deployed to
ISOs,
Snapshots
transfer VNC console
Internet traffic
Data Center 2
VR deployed for traffic
VR SSVM into public internet
CPVM
Management Server is
never in the data path
19. How to build A Cloud-based infrastructure Platform?
A infrastructure Management Platform constitutes:
Provisioning
Configuration Management
Services Orchestration
Monitoring And Alert
How to build ?
Architecture
A programmable infrastructure architecture
Open Source ToolChains
20. A infrastructure Management Platform constitutes
Provisioning
Installation of operating systems and other software
Configuration Management
Sets the parameters for servers, can specify initialized parameters
Services Orchestration
Automate tasks across systems
Monitoring And Alert
Records errors and health of infrastructure
Alert Services
22. Open Source Provisioning Tools
Year Started License Installation
Targets
Kickstart ? GPL Most .dep and RPM
based Linux distros
Cobbler (Plus koan 2007 GPL Red Hat, OpenSUSE
for PXE boot of Fedora, Debian,
VMs) Ubuntu
Spacewalk 2008 GPL Fedora, Centos
Crowbar 2011 Apache (Bare metal
provisioning)
23. Open Source Configuration Management Tools
Year Language License Client/Server
Started
Cfengine 1993 C Apache Yes
Chef 2009 Ruby Apache Chef Solo – No
Chef Server -
Yes
Puppet 2004 Ruby GPL yes
Salt 2011 Python Apache yes
24. Open Source Monitoring Tools
License Type of Collection
Monitoring Methods
Cacti / GPL Performance SNMP, syslog
RRDTool
Nagios GPL Availability SNMP,TCP,
ICMP, IPMI,
syslog
Zabbix GPL Availability/ SNMP,
Performance and TCP/ICMP,
more IPMI, Synthetic
Transactions
Zenoss GPL Availability, SNMP, ICMP,
Performance, SSH, syslog,
Event WMI
Management
25. Open Source Automation/Orchestration Tools
Year Languag Licens Client/Se Support
Started e e rver Organizati
on
Capistrano 2006 Ruby MIT Yes None
Controltier 2010 Java Apache Yes DTO
/RunDeck Solutions
Func 2007 Python GPL Yes Fedora
Project
MCollective 2009 Ruby Apache Yes PuppetLabs
Salt 2011 Python Apache Yes SaltStack
Inc. ?
26. Provisioning Activity Flow And Open Source Tools
ControlTier
Services Portal
Command and Application Services
Control Orchestration And Management
Provisioning Activity
Zabbix
Puppet
Configuration System Configuration
Cloudstack
Cobbler
VM Image
Bootstrapping OS Install
Launch
28. Cloudstack In PPTV
CS Version : 3.0.2
Hypervisor : KVM
Host OS : Centos 6.2
KVM Guest OS : Centos 5.8
Multiple management servers are deployed in the multi-line/BGP IDC
Be deployed to all the core IDC and Used for the Non-vod business
More than 150 hosts
Primary storage : local Storage
Secondary Storage : Local NFS Server and GlusterFS
Network : Basic Network
Monitoring : Zabbix
System configuration management : Puppet
Services Orchestration management : ControlTier/Services Portal
Patches for the performance, integration and stability
Workaround for some issues
29. Deployment Architecture
BGP/Multi-line Management Farm
BGP IDC 沈阳电信 IDC 上海电信 IDC
Manage
ment
Server
SYCB Zone
BGP Zone SHTB Zone
广州电信 IDC 成都电信 IDC 北京网通 IDC
GZTB Zone CDTB Zone
BJCB Zone
30. Management Server Deployment Architecture
MySQL
Management
User API Server1
Load
Balancer Replication
Admin API Management
Server2 Slave
Infrastructure Infrastructure Infrastructure
Resources Resources Resources
zone1 Zone2 Zone3
31. Network Considerations And Design
Using Basic Network
Custom Network offering for basic network(Only use DHCP)
Disable Iptables for performance consideration(modify Sources
Code)
Disable Security Group
Multi-zone design for PrimaryStorage Performance consideration
32. Storage Considerations And Design
Use Local Storage
A cluster mapping to a Host
Primary Storage
A local disk only services a VM instance
L3 switch
Backup VM instance as template on schedule
Using shared storage type
Pod 1 L2 switch
Separating application data and log
Secondary
data to Root Volume and Data Volume Cluster 1 Storage
Secondary Storage
Local NFS Server Host 1 Primary
Backup Data use Inotify and Rsync Storage
Network Card bonding
Up-link to 10G
Failover By manual
GlusterFS over NFS
33. Services Offering Considerations And Design
Disable HA
A disk offering bind the specified disk
A compute offering bind the specified host and disk
34. Provisioning Processes Best Practices
A. Install Host OS by cobber
B. Install CS agent and system settings by puppet
C. Install and configure monitor by puppet
D. Services Orchestration system trigger scripts to register host to CS
E. Services Orchestration system trigger script to generate Disk
offerings and Compute offerings for Host
F. Services Orchestration system register host to CMDB
G. Host go launch
35. Troubleshooting Best Practices
Analyse Log files
Management Log : /var/log/cloud/management/
Agent Log : /var/log/cloud/agent/
Adjust log4j level for debugging
Source Code
Data Models
36. Performance Tuning
BIOS Settings for KVM Host
For Dell PowerEdge servers:
A. Set the Power Management Mode to Maximum
Performance.
B. Set the CPU Power and Performance Management
Mode to Maximum Performance.
C. Processor Settings: set Turbo Mode to enabled .
D. Processor Settings: set C States to disabled.
37. Performance Tuning (contd)
CS Tuning
NFS Server Tuning
Use NFSV4
noatime,nodiratime,noacl,data=writeback,commit=15
IDE/Sata parameters
NIC &TCP/IP
Use GlusterFS
Management Server Tuning
Increase Worker Process Number
Turn off stats collectors
Tuning Allocation Algorithm
Tuning Direct Agent Load Size
Mysql DB tuning
JVM Tuning
Heap Size Tuning
Use CMS GC Algorithm
38. Performance Tuning (contd)
KVM Tuning
CPU
Disable KSM in KVM Host
Disable tickless mode in KVM guest
PIN CPU in KVM host
Memory
THP in KVM Host
echo 'yes' > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
echo 'always'> /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo 'never'> /sys/kernel/mm/redhat_transparent_hugepage/defrag
network performance issue in centos 6.2
Workaround: blacklist vhost-net. Edit /etc/modprobe.d/blacklist-kvm.conf and
include vhost-net.
Linux kernel parameters tuning
TCP Buffer Tuning