Continuous Validation at Scale


Published on

Vijay Seshadri of Symantec's, slide deck from the OpenStack at Mega-scale Meetup on April 2nd, 2014.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Continuous Validation at Scale

  1. 1. Symantec Confidential – Cloud Platform Engineering 1 Continuous Validation at scale Vijay Seshadri Cloud Platform Engineering (CPE), Symantec
  2. 2. Agenda CPE Overview1 What is Continuous Validation?2 SCTF Overview & Usage3 SCTF Design and Roadmap4 2
  3. 3. CPE Overview • CPE Charter – Consolidated cloud infrastructure that offers platform services for Symantec cloud applications • Symantec Cloud Infrastructure already operating at scale – Compute – Reputation based security – Storage – Consumer and Enterprise backup – Network – Hosted email security • How do we leverage the best practices/insights from operating at scale to the new platform? • Core objectives – Secure, scalable and reliable OpenStack based cloud platform
  4. 4. Core Services CPE Platform Architecture 2 Compute Networking Storage CLIs ScriptsCloud Applications Big Data Messaging Identity & Access (Keystone) Supporting Services Authn Roles User Mgmt Tenancy Quotas Logging Metering Monitoring Deployment Compute (Nova) Image (Glance) SDN (Neutron) Load Balancing DNS SQL Batch Analytics Stream Processing Msg Queue Mem Cache Email Relay SSL K/V Store Web Portal Object Store REST/JSON API Cloud Platform Engineering (CPE)
  5. 5. CPE Reference App #1 - Log Collection service CPE Cloud Object Store (Swift) Compute VM0 VM1 LB Container DNS queries Keystone Authentication Log Collection AppLog Sources (e.g security metadata, install logs, telemetery) 1 Acquire an authentication token 2 Create two VMs, associate a network and start them using a CentOS image 3 Create a LB endpoint, place the two VMs in it and configure a DNS entry 4 Provision a container in the Object store 5 Deploy and start the flask application 6 Fetch log files from Object store
  6. 6. Problem Statement • Cloud infrastructure at scale is a highly dynamic environment – Diversity of cloud workloads • Cannot predict application behaviors and patterns – Addition and removal of resources (machines, network equipment etc.) – Configuration drift over a period of time – External events causing huge variations in network, compute and storage consumption – Stability issues occur when you cross scale boundaries (jump an order of magnitude) • Key Question – What validation tools/frameworks do we need to identify issues at scale and remediate them?
  7. 7. What capabilities do we need in a validation framework? • Ability to test generic REST/JSON endpoints (services) – Including OpenStack and platform services • Ability to quickly create tests for functionality, stability and performance – Should not be burdensome for developers • Ability to customize/extend test conditions and/or verification functions • Independent channel of verification – Higher order verification • E.g Just don’t check for return status from individual services, but verify end- to-end function – Extensible, pluggable design • Provide continuous visibility into the health and performance of production cloud – Proactively monitor transient and persistent errors
  8. 8. Continuous Validation State Transitions
  9. 9. Symantec Cloud Test Framework (SCTF) • What is SCTF? – A set of python libraries, scripts and simple text files (YAML) that facilitate the validation of a cloud infrastructure – Primitives for expressing REST requests and validating responses Built in exec function Test Command Validation condition
  10. 10. How to run SCTF? Input YAML file Test case name Validation summary
  11. 11. SCTF Usage – Simple web request Built in Web service function Request URL and Method Response Code
  12. 12. SCTF Usage – Reusable Primitives Test Procedure Name Variable definitions Test case definition
  13. 13. SCTF Usage – Independent channel of verification Built in exec function started after VM create ssh command line Retry args
  14. 14. SCTF Design
  15. 15. SCTF Roadmap • Stream files– enable large file downloads • Test Runner – execute all test files in a directory hierarchy • Preserve comments – retain comments after programmatic manipulation • Improve error reporting - make stack traces and error reporting more descriptive • Incorporate salt to allow remote execution and job management • Allow tests to be run in parallel multiple ( possible ways ) – Use pykka ( ) for actors in single process – Call out to julia ( ) and use the parallel facilities
  16. 16. SCTF Roadmap – Cont’d •Allow test results to be written to files and databases. •Allow test documentation to be queried. •Determine why the test failed – Diagnosis – Remediation – Validate remediation •Add timing and meta data to test output. •Performance as test criteria •Add extension type to allow type handlers to be added at run-time
  17. 17. Summary/Conclusion • We plan use SCTF as a primary means of functional and performance validation – Enable continuous monitoring of the stability and performance of the CPE cloud – Ability to associate diagnosis and remediation with failing functional tests – Scale the ability to generate tests along with the cloud – Enable shorter mean time to resolution • Planning to collaborate with other similar open source projects • Our primary motivation is to ensure the stability of an OpenStack based cloud when deployed at scale