Integrating OpenStack to
  Existing Infrastructure




         Cheng, Hui
      freedomhui@gmail.com
                                      1
                             2012-04-19
Agenda
Background
●   Who We Are
●   Infrastructure & Platform
●   Challenges

Integration Challenges
●   Network Deployment
●   Security Consideration
●   Load Balancer
●   Swift Evaluation

Our Contributions
●   Billing
●   Monitoring


                                         2
Who Are We
                                      Sina.com
                                      • Largest infotainment web portal in China
                                      • Provides various on-line services, like news, Finance,
                                      video, email, blog hosting, etc.
                                      • Operates first PaaS cloud computing platform




Sina Weibo
• twitter-like microblog service
• over 300m users
• huge influence on China's society



             We are building a reliable, scalable and secure
          infrastructure and platform to support our business.

                                                                                             3
Infrastructure & Platform
Physical Servers
Traditional Operation

Virtualization Platform(IaaS)
●VM Management System(VMMS) → Sina Web

Service(SWS)
●VMMS is private solution developed in-house

●SWS is based on OpenStack




Application Platform(PaaS)
●Virtual Host → Sina App Engine(SAE)

●SAE provides both Public and Private Service.

●Proved to be Efficient and Robust




                                                 4
Sina App Engine
• No. 1 Public PaaS Platform in
China launched in Nov 2009
• PHP, Python, Java and Ruby
Support
• Numbers
160,000+ developers
200,000+ apps on SAE
800 million page views per day
20+ Services
• SAE Cloud Storage Service is replaced by Swift
• Deploy SAE on OpenStack



                                                   5
Challenges

SAE meets the majority of business needs, but does not cover
all, especially for web games

Customers require full   stack of cloud computing
We Choose OpenStack as our IaaS solution




                                                               8
Why Choose OpenStack



  100% Python & Open Source




                              9
OpenStack Deployment
                                    Rabbit
                                    MySQL
   dashboard
                                                           schedule
               nova-api

                          nova-compute                nova-compute
                          nova-network                nova-network
               keystone


                                             glance
Sina SSO

                                                                 Swift




                                                                         10
Nova Network
Networking is the biggest challenges for IaaS
Network Topology:
•   VLAN
•   FlatDHCP
•   FlatDHCP & Multihost




                                                11
Network Topology --- VLAN
Capability:
• Accessibility of VMs within one tenant
• Isolation of VMs from different tenants
• VM is able to access public network
• VM can be accessible from public network
• Isolation between virtual network and
  internal network




 Drawback:
 • Pre-allocate network for future projects
 • Traffic bottleneck in the NAT gateway




                                              12
Network Topology(Flat)
Capability:
• Accessibility of all VMs in the fixed IP range
• VM is able to access public network
• VM can be accessible from public network
• Full isolation between virtual network and
  internal network


Drawback:
Tenant isolation lessens
Traffic bottleneck in the NAT gateway




                                                   13
Network Topology(Flat &
                  Multihost)
Capability:
• Accessibility of all VMs in the fixed IP range
• VM is able to access public network
• VM can be accessible from public network

Bonus:
• Totally distributed architecture avoid
  single-point failure.
• Multiple gateway eliminates NAT bottleneck
• High throughout between OS regions

Drawback:
• Tenant isolation lessens
• Need security facility(SWS-filter) to protect
   intranet



           If security problems were solved, this would be our best choice!

                                                                              14
Security in OpenStack
Security Group --- Layer 3 Filter          Static filters --- Layer 2 Filter
Role-based firewall                        MAC, IP, and ARP spoofing protection
  One security group is a Role             Not configurable
Ingress filtering                           Defined in /etc/libvirt/nwfilter/*.xml
  Target is the instance                  Implemented by ebtables
  Source can be CIDR or another group      ebtables -t nat --list
Implemented by iptables
  See details: iptables -t filter -n -L
  Whitelist mechanism(ACCEPT rules)




                                                                               15
Security Enhancement
SWS Filter

Prevent Intranet Penetration
• Intranet is the internal network outside of
  OpenStack
Egress filtering
• Target is internal network
• Source is instances in OpenStack
Implementation
• Whitelist mechanism(ACCEPT rules)
• On the top of nova-filter-top Forward
  Chain

Rational
• SWS filter is managed by cloud manager
• Only explicit authorized packets can reach Internal network C
• Packet should be controlled within Compute Node



                                                                  16
Security Enhancement
Security Group VS SWS Filter




                                     17
Load Balancer
Design

Load Balance
• Dispatch request                                  DNS Acceleration Design
• Support multiple routing algorithm
• Health check
                                                                        Smart DNS
Acceleration
• Reality: narrow bandwidth between ISPs
• Building fiber channels from ISPs to pivot        Public Network
• Given the same endpoint within user’s ISP

IPv4 Shortage                             Telecom      Unicom     Mobile        Others ISP
• Reality: dozens of public IPs support
   hundreds of VMs                                   High speed fiber channel
• IPv4 has been exhausted
• IPv6 is not realistic yet in China
                                                                Pivot



                                                                                     18
Load Balancer
Layer 7 Load Balancer
Consideration:
1. dispatch request by Host header
2. nginx module




                                      19
Load Balancer
Layer 4 Load Balancer
Consideration:
1. dispatch request by TCP port
2. lvs + haproxy




                                      20
Swift Evaluation
   Extremely Durable and Highly Available
   Superior Scalability
   Linear Growth of Performance
   Symmetric Architecture
   No Single-failure
   Simple & Reliable




                                             21
Swift Evaluation
                                                       • 1 Zone = 1 Physical Server with 12x2T disk
                            GET abc.png                • Write/Read applies quorum protocol
                   PUT abc.png


                                      Load Balancer


   Zone1              Zone2                  Zone3                 Zone4              Zone5

 Proxy Server       Proxy Server           Proxy Server          Proxy Server       Proxy Server




Object Server      Object Server          Object Server          Object Server      Object Server

Container Server   Container Server       Container Server      Container Server   Container Server


Account Server     Account Server         Account Server         Account Server     Account Server


                                                                                                      22
Swift Evaluation

   Swift packages
    Proxy Server
   Account Server
  Container Server
    Object Server            Physical Deployment




                                          Storage Nodes
 OS installation


      sda            sdb          sdc        sdd                sdk
      raid 1                                              ……
disk1      disk2     disk3        disk4      disk5             disk12


                                                                        23
Swift Evaluation
Performance issue
CPU utilization rate up to 100% even without request

Testing environment:                 Audit:
Nodes: 5 x Dell R510                 swift-account-auditor :     1.5m
CPU: Intel® Xeon® E5360              swift-account-replicator:   9.5m
Memory: 12GB
Replica: 3                           swift-container-auditor:    8.4m
                                     swift-container-replicator: 9.3m
No. of Objects:    150,000,000       swift-container-updater: 19.0m
No. of Accounts: 120,000
No. of Containers: 160,000           swift-object-updater:       0.1 s
                                     swift-object-replicator:    10.5 hours
                                     swift-object-auditor:       48.3 hours

Result:
Periodic scanning all partitions, calculating checksum and synchronization

                                                                              24
RPC
●   Biling & Monitoring                        Database
                                                Client

    Compute

     Network                RDBMS             Dashboard

     Storage




               Monitoring
                                    Billing
               (Metering)




                                                          25
                NoSQL
●   Kanyun: Monitoring system
     Compute
                     Worker
      Network                            RDBMS                          Dashboard

      Storage

    Worker
                    Retrieve
                     usage
                      info


                                       API daemon                         Billing
    Aggregator
                                    Responds to client
                Calculates/stores       request
                    metrics

                                     http://github.com/lzyeval/kanyun               26
      NoSQL
RPC
●   Dough:Billing system                                              Database
                                                                       Client

    Compute

     Network                   RDBMS                              Dashboard

     Storage




                   Collector


    Monitoring
                                            Farmer               API daemon
    (Metering)

                                          Dispatch jobs          Subscribe or
                   Collector                                     unsubscribe
                                                                  products /
                  Check status /                                  Query info
                 Retrieve usage /   http://github.com/lzyeval/dough              27
                 Create purchases
Q&A




      28

Integrating OpenStack to Existing infrastructure

  • 1.
    Integrating OpenStack to Existing Infrastructure Cheng, Hui freedomhui@gmail.com 1 2012-04-19
  • 2.
    Agenda Background ● Who We Are ● Infrastructure & Platform ● Challenges Integration Challenges ● Network Deployment ● Security Consideration ● Load Balancer ● Swift Evaluation Our Contributions ● Billing ● Monitoring 2
  • 3.
    Who Are We Sina.com • Largest infotainment web portal in China • Provides various on-line services, like news, Finance, video, email, blog hosting, etc. • Operates first PaaS cloud computing platform Sina Weibo • twitter-like microblog service • over 300m users • huge influence on China's society We are building a reliable, scalable and secure infrastructure and platform to support our business. 3
  • 4.
    Infrastructure & Platform PhysicalServers Traditional Operation Virtualization Platform(IaaS) ●VM Management System(VMMS) → Sina Web Service(SWS) ●VMMS is private solution developed in-house ●SWS is based on OpenStack Application Platform(PaaS) ●Virtual Host → Sina App Engine(SAE) ●SAE provides both Public and Private Service. ●Proved to be Efficient and Robust 4
  • 5.
    Sina App Engine •No. 1 Public PaaS Platform in China launched in Nov 2009 • PHP, Python, Java and Ruby Support • Numbers 160,000+ developers 200,000+ apps on SAE 800 million page views per day 20+ Services • SAE Cloud Storage Service is replaced by Swift • Deploy SAE on OpenStack 5
  • 6.
    Challenges SAE meets themajority of business needs, but does not cover all, especially for web games Customers require full stack of cloud computing We Choose OpenStack as our IaaS solution 8
  • 7.
    Why Choose OpenStack 100% Python & Open Source 9
  • 8.
    OpenStack Deployment Rabbit MySQL dashboard schedule nova-api nova-compute nova-compute nova-network nova-network keystone glance Sina SSO Swift 10
  • 9.
    Nova Network Networking isthe biggest challenges for IaaS Network Topology: • VLAN • FlatDHCP • FlatDHCP & Multihost 11
  • 10.
    Network Topology ---VLAN Capability: • Accessibility of VMs within one tenant • Isolation of VMs from different tenants • VM is able to access public network • VM can be accessible from public network • Isolation between virtual network and internal network Drawback: • Pre-allocate network for future projects • Traffic bottleneck in the NAT gateway 12
  • 11.
    Network Topology(Flat) Capability: • Accessibilityof all VMs in the fixed IP range • VM is able to access public network • VM can be accessible from public network • Full isolation between virtual network and internal network Drawback: Tenant isolation lessens Traffic bottleneck in the NAT gateway 13
  • 12.
    Network Topology(Flat & Multihost) Capability: • Accessibility of all VMs in the fixed IP range • VM is able to access public network • VM can be accessible from public network Bonus: • Totally distributed architecture avoid single-point failure. • Multiple gateway eliminates NAT bottleneck • High throughout between OS regions Drawback: • Tenant isolation lessens • Need security facility(SWS-filter) to protect intranet If security problems were solved, this would be our best choice! 14
  • 13.
    Security in OpenStack SecurityGroup --- Layer 3 Filter Static filters --- Layer 2 Filter Role-based firewall MAC, IP, and ARP spoofing protection  One security group is a Role  Not configurable Ingress filtering  Defined in /etc/libvirt/nwfilter/*.xml  Target is the instance Implemented by ebtables  Source can be CIDR or another group  ebtables -t nat --list Implemented by iptables  See details: iptables -t filter -n -L  Whitelist mechanism(ACCEPT rules) 15
  • 14.
    Security Enhancement SWS Filter PreventIntranet Penetration • Intranet is the internal network outside of OpenStack Egress filtering • Target is internal network • Source is instances in OpenStack Implementation • Whitelist mechanism(ACCEPT rules) • On the top of nova-filter-top Forward Chain Rational • SWS filter is managed by cloud manager • Only explicit authorized packets can reach Internal network C • Packet should be controlled within Compute Node 16
  • 15.
  • 16.
    Load Balancer Design Load Balance •Dispatch request DNS Acceleration Design • Support multiple routing algorithm • Health check Smart DNS Acceleration • Reality: narrow bandwidth between ISPs • Building fiber channels from ISPs to pivot Public Network • Given the same endpoint within user’s ISP IPv4 Shortage Telecom Unicom Mobile Others ISP • Reality: dozens of public IPs support hundreds of VMs High speed fiber channel • IPv4 has been exhausted • IPv6 is not realistic yet in China Pivot 18
  • 17.
    Load Balancer Layer 7Load Balancer Consideration: 1. dispatch request by Host header 2. nginx module 19
  • 18.
    Load Balancer Layer 4Load Balancer Consideration: 1. dispatch request by TCP port 2. lvs + haproxy 20
  • 19.
    Swift Evaluation  Extremely Durable and Highly Available  Superior Scalability  Linear Growth of Performance  Symmetric Architecture  No Single-failure  Simple & Reliable 21
  • 20.
    Swift Evaluation • 1 Zone = 1 Physical Server with 12x2T disk GET abc.png • Write/Read applies quorum protocol PUT abc.png Load Balancer Zone1 Zone2 Zone3 Zone4 Zone5 Proxy Server Proxy Server Proxy Server Proxy Server Proxy Server Object Server Object Server Object Server Object Server Object Server Container Server Container Server Container Server Container Server Container Server Account Server Account Server Account Server Account Server Account Server 22
  • 21.
    Swift Evaluation Swift packages Proxy Server Account Server Container Server Object Server Physical Deployment Storage Nodes OS installation sda sdb sdc sdd sdk raid 1 …… disk1 disk2 disk3 disk4 disk5 disk12 23
  • 22.
    Swift Evaluation Performance issue CPUutilization rate up to 100% even without request Testing environment: Audit: Nodes: 5 x Dell R510 swift-account-auditor : 1.5m CPU: Intel® Xeon® E5360 swift-account-replicator: 9.5m Memory: 12GB Replica: 3 swift-container-auditor: 8.4m swift-container-replicator: 9.3m No. of Objects: 150,000,000 swift-container-updater: 19.0m No. of Accounts: 120,000 No. of Containers: 160,000 swift-object-updater: 0.1 s swift-object-replicator: 10.5 hours swift-object-auditor: 48.3 hours Result: Periodic scanning all partitions, calculating checksum and synchronization 24
  • 23.
    RPC ● Biling & Monitoring Database Client Compute Network RDBMS Dashboard Storage Monitoring Billing (Metering) 25 NoSQL
  • 24.
    Kanyun: Monitoring system Compute Worker Network RDBMS Dashboard Storage Worker Retrieve usage info API daemon Billing Aggregator Responds to client Calculates/stores request metrics http://github.com/lzyeval/kanyun 26 NoSQL
  • 25.
    RPC ● Dough:Billing system Database Client Compute Network RDBMS Dashboard Storage Collector Monitoring Farmer API daemon (Metering) Dispatch jobs Subscribe or Collector unsubscribe products / Check status / Query info Retrieve usage / http://github.com/lzyeval/dough 27 Create purchases
  • 26.
    Q&A 28