Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Boston open stack meetup deployment case study
1. Beth Cohen
Boston OpenStack Meet-Up
Sr. Architect
Global Electronics Manufacturer OpenStack Project
Cloud Technology Partners
617.721.7256 │ Beth.Cohen@cloudTP.com May 16th, 2012
TRANSFORM INNOVATE OPTIMIZE
2. Theme: Building Cloud Right
“First you build your cloud, then you need to operate it. That is less than
straightforward in a typical enterprise IT environment.”
Gordon Haff,
Cloud Computing Evangelist,
Red Hat
www.cloudTP.com 2 May 16, 2012 Boston OpenStack Conference
3. Boston OpenStack Cloud Meet-Up
Agenda
• Overview of client cloud build-out project
• Project challenges
• Three pronged solution
• Organizational recommendations
• Network architecture recommendations
• Tools recommendations
• Lessons from the trenches
www.cloudTP.com 3 May 16, 2012 Boston OpenStack Conference
4. Overview of Cloud Project
• Client is $3 billion IT infrastructure outsourcing company separate from
Global Electronics Co.
• Executive mandate was to build a cloud in support of consumer
division activities
• Requirements:
– Create an IT organization to support the cloud infrastructure
– Internally develop applications to support millions of external customers
– Provide a platform and tools for building future applications
www.cloudTP.com 4 May 16, 2012 Boston OpenStack Conference
5. Objectives
• Review client Cloud project, architecture and identify technical and operational risks
• Recommendations to address risks
• Recommend appropriate Swift architecture based on client planned uses
• Modify Swift staging environment
• Build Nova and Swift deployment automation based on the recommended architecture
• Document deployment plan/process
• Recommend appropriately scaled Nova network architecture based on client planned uses
www.cloudTP.com 5 May 16, 2012 Boston OpenStack Conference
6. Challenges
• Highly competitive consumer electronics sector
• Very traditional IT organization with poor reputation within the company
• IT organization had to continue to support its other internal and external customers
• Lack of experience with incorporating new technology into IT environments
• Little in-house cloud expertise
• Had chosen immature cloud technology
• Weak middle management support for project
• Slow business decision on planned system use
www.cloudTP.com 6 May 16, 2012 Boston OpenStack Conference
7. Risk
• Operations and business
• Nova network limitations
• Security
• Scalability
• High availability
Client Cloud Concept
www.cloudTP.com 7 May 16, 2012 Boston OpenStack Conference
8. Risk Analysis: Basic Approach and Principals
• Tactical approach to match client culture
• Limit duplication of effects across the organization
• Create realistic target metrics
• Learning to live with continuous change
• An emphasis on operational automation
• Continuous testing and test driven development
• Develop a culture supportive of cross functional teamwork
• Identify small changes that make big impacts
www.cloudTP.com 8 May 16, 2012 Boston OpenStack Conference
9. Operations and Business Risks
• Vendor lock-in
• Complicated to configure and maintain
• Large number of protocols and configurations
• Many hardware components
– Use software load balancers, firewalls, etc. to reduce costs and increase flexibility
• Difficult administrative access
– Use modern data center best practices
www.cloudTP.com 9 May 16, 2012 Boston OpenStack Conference
10. Security Risks
• Remote administrative access • Single firewall in cloud
– Follow industry best practices – Use software firewalls instead
• Virtual Machines could be • Potential access to network through
compromised switches
– Isolate networks to limit exposure – Configure switches for maximum isolation
of protocols
• Server nodes share network w/VMs
– Isolate networks to limit exposure
www.cloudTP.com 10 May 16, 2012 Boston OpenStack Conference
11. Scalability Risks
• HP Tipping Point IDS system supports maximum 10GB bandwidth
– Change to SW Firewalls
• Limited North/South bandwidth doesn’t lend itself to horizontal scaling
– Use Layer 3 distributed core
• Network connectivity to the Internet is currently 2 10GB up-links
– Add up-links as needed
• Number of racks in cluster limited to 15
– Use Layer 3 distributed core or Spine and Leaf configuration
• Block storage network allocates to specific racks rather than spreading the network traffic evenly across the compute nodes
– Use Layer 3 w/traffic shaping to distribute traffic
• VM state changes stored in core network fabric
– Use virtual networking to segregate VM traffic
www.cloudTP.com 11 May 16, 2012 Boston OpenStack Conference
12. High Availability Risks
• Several Single Points of Failure (SPOF) in environment
– Identify and change architecture to remove
• MLAG doesn’t scale
– Use Layer 3 network to eliminate bottleneck
• Costly hardware redundancy in systems
– Re-architect to take advantage of software redundancy instead
• Nova Network Issues
– Use virtual networking
– Nova Network node is a Single Point of Failure (SPOF)
– Immature networking software – Quantum is still under development
– Nova DHCP server is also a SPOF
www.cloudTP.com 12 May 16, 2012 Boston OpenStack Conference
13. Organizational Recommendations – Operation Principles
• Create an operational mindset across the team
– Develop better understanding of Open Source, cloud architectures, Agile methodologies, continuous dev, test
and integration, overall dev/ops concepts in general
• Coordinate the Openstack development effects across the project
– Build clear communication channels between functional groups
• Leverage the Openstack community efforts more effectively
– Client team encouraged to contribute key features back to the Openstack project
• Create more/better test metrics and test harnesses to support continuous and integrated dev/test
processes and automation
• Leverage existing Global Electronics Co. organizational expertise on how to run highly efficient
operations organizations
www.cloudTP.com 13 May 16, 2012 Boston OpenStack Conference
14. Network Architecture
www.cloudTP.com 14 May 16, 2012 Boston OpenStack Conference
15. Network Considerations
• Need for vendor independence – Don’t rely on specific features of router/switch vendors,
– Example: MLAG is vendor specific
• Need to massively scale ecosystem
– Hierarchical addressing modeled on Internet is only real option
• Need to design for cost efficient operations – minimize hardware redundancy, etc.
• No new hardware for staging environment
• No single point of failure in the network
• Tolerant of rack level failure
www.cloudTP.com 15 May 16, 2012 Boston OpenStack Conference
16. Network Requirements
• Independent network requirements for physical server nodes and virtual machines (VM)
• Need to isolate VM networking information from the core network for scaling
• Many components interact at different levels of the system stack adds complexity
• Need to isolate networks and separate functions for security
• Separate networks by function for traffic shaping
• Complex data paths – Data between VM’s, East/West and in and out of the system, North/South
• OpenStack has a weak high availability architecture
www.cloudTP.com 16 May 16, 2012 Boston OpenStack Conference
17. Network Decisions
• Choose virtual networking or flat networking
– Recommend virtual networking
• eBGP or static with Suwon backbone
– Recommend eBGP
• Software or hardware load balancing
– Recommend software LB
• Spine and leaf or distributed core topology
– Recommend distributed core
• Network shared with storage or dedicated storage network
– Recommend shared network
• VIP or MPIO (multipath) for iSCSI redundancy
– Recommend MPIO
www.cloudTP.com 17 May 16, 2012 Boston OpenStack Conference
18. Option 1: Shared Storage and VM Network
www.cloudTP.com 18 May 16, 2012 Boston OpenStack Conference
19. Option 2: Separate Storage and VM Network
www.cloudTP.com 19 May 16, 2012 Boston OpenStack Conference
20. Option 1: Layer 3 with Virtual Networking
182.196.0.0/22
0.0.0.0/0
Suwon Network Edge
182.196.0.1 182.196.0.255
EBGP/30 EBGP/30
Eth1/182.196.0.100 Eth1/182.196.0.101
Cloud Network Edge
Private AS eg. AS64512
Eth0 192.168.8.5 Eth0 192.168.9.10
Virtual Switch Virtual Switch
Cloud Backbone Network
192.168.1.5 192.168.1.7 192.168.2.5 192.168.2.7
Nova Compute Node 1 Nova Compute Node 2
iSCSI SAN iSCSI SAN
Node 1
VM view
VM VM’s
Tapx/10.10.2.x
Tap0/182.196.2.7
Eth 192.168.1.5
Virtual Switch
www.cloudTP.com 20 May 16, 2012 Boston OpenStack Conference
21. Network Recommendations
• Change from a Layer 2 to a Layer 3 configuration to build dense multipath network
core and support for multi-directional scaling and flexibility
• Isolate virtual networks using traffic shaping for performance
• Isolate virtual networks using L2 over L3 encapsulation
• Use eBGP to connect to the Internet up-link
• Use iBGP for internal traffic on the mesh
• Determine best configuration for block storage network
www.cloudTP.com 21 May 16, 2012 Boston OpenStack Conference
22. Spine and Leaf Cloud Network Diagram
Expand network by adding either aggregators or ToR switches.
Each is independent, this allows maximum flexibility.
Notes: Llinks from ToR switches up to the core aggregation layer either 10GB or 40GB.
All links in network /30 Network iBGP.
www.cloudTP.com 22 May 16, 2012 Boston OpenStack Conference
23. Scaling of Spine and Leaf Network
www.cloudTP.com 23 May 16, 2012 Boston OpenStack Conference
24. Distributed Core Cloud Network Diagram
Notes: Llinks from ToR switches up to the core aggregation layer either 10GB or 40GB
All links in network /30 Network iBGP
Expand network by adding either aggregators or ToR switches.
www.cloudTP.com 24 May 16, 2012 Boston OpenStack Conference
25. Scaling of Distributed Core Network
www.cloudTP.com 25 May 16, 2012 Boston OpenStack Conference
26. Option 2: Scaling Using Availability Zones
www.cloudTP.com 26 May 16, 2012 Boston OpenStack Conference
27. Scaling of Distributed Core Network with AZ
www.cloudTP.com 27 May 16, 2012 Boston OpenStack Conference
28. Tools Recommendations – Deployment Using Crowbar
• Support for HP hardware added
• 2 Crowbar servers installed
– One on management network – ss15
– One on Swift Proxy rack -- 106
• Documentation of Crowbar deployment – Completed
• Stabilize Crowbar system for production – Waiting for April 1.3 Release
• Crowbar development work remaining
– Barclamp for DHCP relay for subnets near completion
• Infra team Crowbar training to be done by Client
www.cloudTP.com 28 May 16, 2012 Boston OpenStack Conference
30. Advice From the Trenches
• Think holistically
• Top management needs to actively support cross organizational change
• Focus on building in-house expertise in cloud:
– Architecture
– Networking
– Applications
– Data center operations
• Use the rack as the base unit for scaling
• Scale the cloud horizontally, not vertically
• Automate, automate, automate!
www.cloudTP.com 30 May 16, 2012 Boston OpenStack Conference
31. Boston OpenStack Meet-Up
May 16th, 2012
Questions?
cloudTP.com
Beth Cohen
Cloud Technology Partners P: 617.674.0874
Chief Architect & Technology Officer Info@cloudtp.com
617.721.7256 │ Beth.Cohen@cloudTP.com 308 Congress St, 5th Floor Boston MA, 02210
TRANSFORM INNOVATE OPTIMIZE