OPENSTACK HA @PAYPAL

Open Stack Summit – Hong Kong - 2013
ABOUT PAYPAL
PayPal offers flexible and innovative payment solutions for consumers
and merchants of all sizes.

• 137,000,...
AGENDA
Why HA is important for PayPal?

Our Learning
Our Solution
What is not solved?
Q&A

3
WHY HA IS IMPORTANT?
“no perceived downtime” for cloud users

Enterprise Class
Auto Scaling & Flex up/down can never break...
AVAILABILITY REQUIREMENTS
No SPOF “Under the Cloud”

Scale Across the Data Center(s)
Scale Across Racks & Containers

Resp...
INFRASTRUCTURE RACK
Layer 2
versus
Layer 3

10g
Active

10g
Passive

1g
Mgmt

Infrastructure / Controller Racks

10g
Passi...
INFRASTRUCTURE RACK

OpenStack Services are all VM on KVM
Every infra component resides on 2+ nodes
Redundant physical rac...
COMPUTE
1
2
LB Active

Access

LB Passive

LB Active

LB Passive

3
1g
Mgmt
10g
Passive
10g
Active

1g
Mgmt
10g
Passive
10...
COMPUTE

Active

10g 10g

10g
10g
bond0

1g

Top Of Rack

10g
10g
bond0

Hyperscale
Raid-10

1g

9

Passive

10g 10g

Mana...
swift storage node
swift storage node
swift storage node

OPENSTACK SERVICES

swift
swift-object
swift-container
swift-acc...
OPENSTACK CONSIDERATIONS
LB VIP for every service (unless it can‟t)
Connect to LB VIP, not individual nodes
Script to clos...
CONTINUED…
HEAT with Corosync/Pacemaker/keepalived (for now)

KeyStone / Nova / Glance / Swift Proxy
Rabbit MQ Cluster
Cin...
CINDER SERVICES WORKFLOW
User request
(create volume)

1

Cinder API

2
AMPQ

5
Cinder Volume

6
Storage
Backend1
13

Cind...
CINDER SERVICES WITH HA
User request
(create volume)

1

How HA is implemented for
Cinder Components:

Load Balancer
Cinde...
UNRESOLVED
VIP-friendly Cinder Volume service

Seamless Upgrade Flip
Failed DB TX Reconciliation
Consistent API Response T...
cloud@paypal.com

16

Confidential and Proprietary
THANK YOU
HTTP://GITHUB.COM/PAYPAL/AURORA
SCOTT CARLSON - @RELAXED137
RAJ GEDA
ZHITENG HUANG IRC:WINSTON-D
Upcoming SlideShare
Loading in …5
×

High Availability OpenStack at PayPal - OpenStack Summit Fall Hong Kong 2013

1,228 views

Published on

This is the presentation from the OpenStack Hong Kong Conference from Fall 2013.

There are many different blueprints describing how high-availability can be achieved underneith an OpenStack cloud. At PayPal, we have chosen to utilize some of the common OpenStack best practices as well as introducing common Data Center best practices to bring high availability to the management/control infrastructure within our cloud. Topics Included: Design of our Openstack Control infrastructure Pros and Cons of management and infrastructure racks separate from a compute rack High Availability requirements by component Pros and cons of High Availability choices external to and within the cloud Trade-offs that need to be made now to ensure availability

http://www.openstack.org/summit/openstack-summit-hong-kong-2013/session-videos/presentation/openstack-high-availability-paypal

Published in: Technology, Health & Medicine
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,228
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • So a little bit about PayPal before we start, let’s quickly run through with some key details on what PayPal is and what we do.And we’re a payments company.You can think of PayPal as a digital wallet – one convenient, secure spot to keep all your ways to pay.And PayPal is not just on the internetfor you to send money to a friend, or buy something on eBay – along with numerous merchants that let you pay with PayPal online,we are also in-store, in places like Home Depot and GNC. And with this brick and mortar presence, you can leave your wallet at home, punch in your phone number and PIN code, and still buy something.And with payment innovations like that, we continue to grow, as these numbers show, 137m active users, 300,000 dollars worth of payments/min… this tells you that scale is important to us, and we scale on a global basis to meet theneeds of our customers worldwide, especially here in Asia.We’re talking about nearly 200markets and 26 currencies. We literally are the world’s most widely used digital wallet.
  • Shift from Enterprise design model to cloud-based designElastically scale and self-heal infrastructure to accommodate unpredictable usage patterns of customers and internet commerceSeparate rapidly iterating customer experiences from core servicesreduce overall cost per transaction within the environment
  • Infrastructure Rack only for Cloud Management GearCompute racks scale as far asIP addresses run outNeutron network(s) …NVP Gateway Limit …
  • Infrastructure Rack only for Cloud Management GearCompute racks scale as far asIP addresses run outNeutron network(s) …NVP Gateway Limit …
  • Two Entry Points for InfrastructurePayPal Product DevelopersCloud Operators to manage CloudCentrally Orchestrated using HeatLocal StorageHP 4X600 GB(MirrorCisco 4948 & Arista 7050Nicira NVPF5 10.2.2 LB
  • http://www.palominodb.com/blog/2012/12/10/benchmarking-ndb-vs-galeraMaria DBBottleneck on LB during Image transferHeat active/standby support, no active/active cluster
  • http://www.palominodb.com/blog/2012/12/10/benchmarking-ndb-vs-galeraMaria DBBottleneck on LB during Image transferHeat active/standby support, no active/active clusterCinder Volume Service doesn’t play well with load balancer and VIP.
  • Talk about cinder HA issuesVM Create issues due to failed Rabbit MQ message deliveryIssues in Upgrade without downtime for major versions rolloutNo Auto cleanup for stale DB rowsThe API Response is not consistent due to DB locks and DB Connection threads
  • High Availability OpenStack at PayPal - OpenStack Summit Fall Hong Kong 2013

    1. 1. OPENSTACK HA @PAYPAL Open Stack Summit – Hong Kong - 2013
    2. 2. ABOUT PAYPAL PayPal offers flexible and innovative payment solutions for consumers and merchants of all sizes. • 137,000,000 users • $300,000 payments processed each minute • 193 markets / 26 currencies • The World‟s Most Widely Used Digital Wallet 2
    3. 3. AGENDA Why HA is important for PayPal? Our Learning Our Solution What is not solved? Q&A 3
    4. 4. WHY HA IS IMPORTANT? “no perceived downtime” for cloud users Enterprise Class Auto Scaling & Flex up/down can never break API Integrations always succeed Everyone expected to use the cloud 4
    5. 5. AVAILABILITY REQUIREMENTS No SPOF “Under the Cloud” Scale Across the Data Center(s) Scale Across Racks & Containers Respect natural availability zones within the data centers No „cloud‟ can impact any other „cloud‟ 5
    6. 6. INFRASTRUCTURE RACK Layer 2 versus Layer 3 10g Active 10g Passive 1g Mgmt Infrastructure / Controller Racks 10g Passive 10g Active LB Passive 1g Mgmt 6 10g Active Compute Racks … 10g Passive … 1g Mgmt 1g Mgmt LB Active 10g Passive Access 10g Active Cattle & Puppies
    7. 7. INFRASTRUCTURE RACK OpenStack Services are all VM on KVM Every infra component resides on 2+ nodes Redundant physical racks Redundant power/switches in each rack Layer-3 connectivity between racks (no Layer 2) Enterprise Grade Physical LB (floating VIP) 7
    8. 8. COMPUTE 1 2 LB Active Access LB Passive LB Active LB Passive 3 1g Mgmt 10g Passive 10g Active 1g Mgmt 10g Passive 10g Active 1g Mgmt 1g Mgmt 10g Passive 10g Passive 10g Active 10g Active 10g Active 10g Passive 10g Active Compute Node 96 Hyperscale 16 Core 256GB Ram 1.1T Disk 1g Mgmt 10g Passive 10g Active 10g Active Compute Node 96 Hyperscale 16 Core 256GB Ram 1.1T Disk 1g Mgmt 10g Passive 10g Passive 8 1g Mgmt 1g Mgmt Compute Node 96 Hyperscale 16 Core 256GB Ram 1.1T Disk Compute Node 96 Hyperscale 16 Core 256GB Ram 1.1T Disk
    9. 9. COMPUTE Active 10g 10g 10g 10g bond0 1g Top Of Rack 10g 10g bond0 Hyperscale Raid-10 1g 9 Passive 10g 10g Management 1g Top Of Rack 1g Hyperscale Raid-10
    10. 10. swift storage node swift storage node swift storage node OPENSTACK SERVICES swift swift-object swift-container swift-account 6000 / TCP Browser 6001 / TCP UDNS (DNSaas) UDNS (DNSaas) 6002 / TCP 80 / TCP quantum Openstack Controller Openstack Controller Openstack Controller 9696 / TCP 80 / TCP Quantum Server Quantum Server quantum-api LBaas LBaas 53 / TCP 10053 / TCP 22,80,443,161 / TCP 161/ UDP 80 / TCP DNS Master F5 Load Balancer Remedy API httpd (dashboard) 443 / TCP glance 9292 / TCP 9191 / TCP openflow 6633 / TCP mgmt port 6632 / TCP 35357 / TCP 5000 / TCP 8773 / TCP 8774 / TCP NVP Service Node NVP Service Node NVP Service Node 8776 / TCP 8080 / TCP glance-admin glance-reg 8140 / TCP F5 Load Balancer Puppet DB 61613 / TCP Puppet VIP keystone keystone-admin keystone-api nova nova-api novametadata-api novavolume-api swift-proxy 3115 / TCP Nicira NVP Controller Nicira NVP Controller Nicira NVP Controller 3115 / TCP F5 Load Balancer xxxx / TCP NVP Gateway NVP Gateway NVP Gateway Compute Node Hypervisor MYSQL DB MYSQL DB mysql 5 nova mq OpenVswitch ovs-vswitchd ovsdb-server puppet Mongo DB Mongo DB mongo db
    11. 11. OPENSTACK CONSIDERATIONS LB VIP for every service (unless it can‟t) Connect to LB VIP, not individual nodes Script to close Server Connections Pacemaker only works inside a single Layer-2 (not a large enterprise) Auto Restart using Monit MySQL Swift Cluster 11
    12. 12. CONTINUED… HEAT with Corosync/Pacemaker/keepalived (for now) KeyStone / Nova / Glance / Swift Proxy Rabbit MQ Cluster Cinder Volume Service 12
    13. 13. CINDER SERVICES WORKFLOW User request (create volume) 1 Cinder API 2 AMPQ 5 Cinder Volume 6 Storage Backend1 13 Cinder Scheduler 3 Storage Backend2 4 Figure shows a typical interaction between Cinder components to serve a end user request. (create new volume in this example).
    14. 14. CINDER SERVICES WITH HA User request (create volume) 1 How HA is implemented for Cinder Components: Load Balancer Cinder Scheduler A 2 Cinder API A Cinder Scheduler B Cinder API B AMPQ Cluster 3 4 5 Cinder Volume A Cinder Volume B 6 14 Storage Backend1 Storage Backend2 • API (stateless) – Load Balancer (A/A or A/P); • Scheduler (stateless) – Pacemaker, Queue itself (A/A or A/P); • Volume – Pacemaker, Queue itself (A/A or A/P).
    15. 15. UNRESOLVED VIP-friendly Cinder Volume service Seamless Upgrade Flip Failed DB TX Reconciliation Consistent API Response Time 15
    16. 16. cloud@paypal.com 16 Confidential and Proprietary
    17. 17. THANK YOU HTTP://GITHUB.COM/PAYPAL/AURORA SCOTT CARLSON - @RELAXED137 RAJ GEDA ZHITENG HUANG IRC:WINSTON-D

    ×