TopStack Architecture
Q3 2013 Update
2
The basics
3
Overview
 TopStack is a suite of services to extend Infrastructure as a Service (IaaS) solutions
and deliver key Platform Services (PaaS)
 TopStack delivers a clean-room implementation of many of Amazon’s most popular
services
 TopStack runs on private clouds as well as third party public clouds
 The 2013-Q3 focus for TopStack is to act as a complement for OpenStack.
 TopStack is available in both a Community Edition (open source) and an Enterprise
Edition (commercial license & support).
4
Source Code Organization & Control
 Source code for TopStack is stored in Git, open source is in github
 Each service is stored in a separate repo
 Common repo: ToughCore, for shared utility code
 Common repo: ToughResources, for shared static assets
 All repos have a master branch for current production code
 Tag is applied for each production release
 All repos have at least one development branch for current developmentAdditional
feature branches are created for feature development, as needed
5
Build Management and Quality Assurance
 Services are built with Ant build files, with Maven tasks for dependency resolution
 Dependencies are resolved through local file copies (in dev mode)
 Dependencies are resolved through Jenkins artifacts (in build mode)
 Builds are managed through Jenkins continuous integration
 All service include unit tests that run with each build
 Services are deployed as part of continuous integration to Dev clouds
 Post deploy, Java integration tests are performed against fresh deploy
 Any failed integration tests will cause the build to be marked broken
 Once a day, a “long running” set of integration tests are run
 Long running tests spin up instances and test advanced connectivity
6
Continuous Deployment
 Continuous deployment is performed by Jenkins, with jobs deploying to Dev, etc.
 Deployments pushed to multiple cloud platforms, versions, …
Cloud 1 Cloud 2
7
Installation & Deployment
 Installation package is a single package file (.tar.gz), output from continuous build
 Unpacked, install package consists of:
 Master installation shell script
 Install guide (PDF)
 Packaged services, to be deployed by installation script
 Base Image configuration script
 Installation may be re-run as needed to install/configure additional instances
 Options to installation allow the installer to include/exclude particular services
 Required supporting services are always installed
8
Deployment
 Current tested deployment configuration:
 OpenStack Grizzly or greater (older versions work, but are not commercially
supported)
 nova-compute with libvirt+KVM, libvirt+QEMU, libvirt+XenServer
 nova-volume/Cinder, any iSCSCI backend
 nova-network/Quantum, single VLAN
 Linux VM, Ubuntu 12.04 or greater
9
Services Offered to Customers
 Elastic Load Balancer
 Route 53
 Relational Database Service
 ElastiCache
 Simple Queue Service
 CloudWatch
 CloudFormation
 Elastic Beanstalk
 Auto Scale
This deck wont cover
these in any detail
10
Internal Services & Components
 Internal Services (daemons):
 Service registry & configuration
 Orchestration & events
 Job Scheduling
 Common Components:
 Common logging
 Persistence
 Instance configuration (Chef)
 Authorization & access control
 Quotas & metering
 …
 Cloud platform bindings
 Instrumentation
 Administration
 Inter-Service Communication
11
DEPLOYMENT MODEL
12
Cloud Image Repo
TopStack Master VM
Tomcat 7
Deployment Model - Evaluation/TopStack Lite
12
TopStack SLB
DNS53
SQS
CloudWatch
RDS
Other TopStack Services
Apache 2
StackStudio
Chef Server
PubSub Queue
TS Base Image
TopStack DNS53
MySQL
13
TopStack SLB VM
TopStack Service VM2
Deployment Model - TopStack Enterprise
13
TopStack SLB
TopStack Service VM1
Tomcat 7
DNS53
SQS
CloudWatch
RDS
Other TopStack
StackStudio VM
Apache 2
StackStudio
Cloud Image Repo
Chef VM
Chef Server
[Optional]
DB VM
MySQL
Queue VM
PubSub Queue
DNS53 VM
DNS53
TS Base Image
14
Chef VM
TopStack Service VMn
Deployment Model - TopStack HA
14
TopStack Service VM2
StackStudio VM
Apache 2
StackStudio
TopStack Service VM1
TopStack ELB VM2TopStack ELB VM1
Cloud Image Repo
[Optional]
DB Active
MySQL
DB Standby
MySQL
Chef Cluster
Chef Server
Chef VM
Queue Cluster
PubSub Queue
Chef VM
DNS53 Cluster
DNS53
TS Base Image
15
Internal Services
16
Service Registration & Configuration
 All services must register with DNS53 on startup
 DNS53 maintains private zone for Transcend internal use
 Installation creates addresses for TopStack hosts
 Registration creates CNAMEs for individual services in DNS
 DNS information is used by Transcend load balancer to direct traffic
TopStack ServiceSLB
Request Handler
Thread
17
Orchestration & Events
Request Handler
Thread
Open Transaction
IaaS
Provider
Create WF
Client
Request
Response
Commit Transaction
TopStack
Workflow
Cloud Op Task
Notify Task
Complete
RDS
Work
CF
Work
SLB
Work
Workflow Step 1
Workflow Error State
Quartz
Open Transaction
Commit Transaction
Cloud Op Task
Workflow Step 2
Rollback Resources
Continuation
Request ID Cache
18
Orchestration & Events
 Services only own workflow steps and a light servlet for request/response
 Pub-sub mechanism between TopStack API front end and service workers
 ZeroMQ (http://www.zeromq.org)
 Protocol Buffers as serialization format for ZeroMQ
 Workflow solution to handle multiple asynchronous service steps:
 Mule ESB (http://www.mulesoft.org/)
 Asynchronous requests from HTTP handlers
 Tomcat 7 with Servlet 3.0 asynchronous servlets (continuations)
 Request IDs to marry asynchronous responses to requests
19
Workflow
19
 Many services consist of multiple operations, both synchronous and asynchronous
 For example, a Relational Database is created:
 An instance must be spun up
 Volume is created (in parallel)
 Public IP must be associated
 Instance startup is complete
 Volume is attached
 Database installation is performed
 etc.
 Any workflow step may fail, in which case:
 Allocated resources must be torn down, freed
 Failure must be reported, handled appropriately
20
Job Scheduling
 Scheduled jobs are executed using Quartz Enterprise Job Scheduler
 Quartz runs in clustered configuration
 Jobs are executable by any TopStack instance
 Scheduled jobs are stored in relational DB
 Services may add new jobs to be executed during e.g. maintenance windows
 Quartz is a source of workflow jobs
 For example, on setting RDS maintenance window, a Quartz job is created
 When Quartz job fires, RDS code is invoked to submit workflow
21
Common Components
22
Common Logging
 Logging from all TopStack services is performed through SLF4J library
 Logging implementation is typically Log4J
 Logging may be directed to syslog (including TCP) or simple files
 Configuration provides opportunity for aggregation, mining
23
Authorization & Access Control
 Each TopStack account will require an active IaaS cloud credential set
 IaaS credentials are encrypted at rest
 Actions are performed using credentials associated with TopStack account
 IaaS authorization and access limits define TopStack limits
24
Instance Configuration
 Chef Server
 Deployment includes an embedded Chef server (http://www.opscode.com/chef)
 Embedded Chef includes a set of Transcend recipes to build up resources
 Chef Client
 Transcend Base Image burns a Chef client into the image
 As new instances are started by TopStack, a Chef configuration and role are injected
 Instances dial-back to TopStack as the final step of configuration to become ready
25
Persistence
 Configuration and event data is stored in a relational database (default MySQL)
 Data access is through a DAO layer and Hibernate, an O/R mapping layer
26
Cloud Platforms Bindings
 TopStack configuration requires cloud “flavor” as input; OpenStack, Eucalyptus, etc.
 IaaS cloud must provide the core operations used by TopStack (or equivalents):
 Create/Terminate VM Instance
 Allocate/Release IP Address
 Associate/Disassociate IP Address
 Describe Instances
 Create/Delete Security Group
 Describe Security Groups
 Authorize/Revoke Security Ingress
 Create/Delete Volume
 Describe Volume
27
Quotas & Metering
 All quotas enforced by IaaS provider apply to TopStack instances as well
 Some quota is consumed by TopStack constructs that map to quota items
 E.g., RDS security group consumes an IaaS security group
28
Instrumentation
 All TopStack hosts are monitored as CloudWatch instances
 Installation process configures hosts
 Metrics are available though normal CloudWatch APIs
 All TopStack service hosts expose basic management information
 All hosted services are available, with service status
 Service workers (workflow steps) maintain “health” information
 Count of tasks processed
 Count of task with abnormal outcome
 Transactions processed per second
 Collected via metrics (http://metrics.codahale.com/)
29
Administration
 TopStack Enterprise Edition provides an Administration Console
 Console runs on each TopStack host
 Allows central administration of services
 Allows provisioning of user accounts
 Provides information on active services, failure rates, scheduled jobs
30
Inter-Service Communication
 TopStack services communicate with each other only as workflow steps
 Subsequent workflow steps are routed through Pub/Sub queue
 Loosely coupled, via workflow
31
Non-functional requirements
32
High Availability
 TopStack service hosts run in parallel on different VMs; scale-out architecture
 VMs may be removed from service & load will redistribute across remaining instances
 Workflows in progress will be continued by other instances
 TopStack persistence tier may be run in master/slave or cluster configuration
33
Scalability
 TopStack host machines can run any or all TopStack services
 TopStack endpoints are load balanced across available service hosts
 Many service hosts can run in an environment; new hosts register services on start
 TopStack persistence tier scales vertically to support large transaction volumes
34
Portability
 The Dasein cross-cloud library allows TopStack to operate against the most popular
clouds
 TopStack assumes only core IaaS services are available
 Most clouds provide core IaaS services, or services which may be mapped to IaaS
35
Security
 TopStack services are secured with access key and a secret key/password
 Optionally, customer can add HSM for increased security
 Secret key/password is not transmitted without encryption
 Enterprise Edition provides additional OS level lock-downs (PCI DSS)

TopStack Product Architecture 2013-Q3

  • 1.
  • 2.
  • 3.
    3 Overview  TopStack isa suite of services to extend Infrastructure as a Service (IaaS) solutions and deliver key Platform Services (PaaS)  TopStack delivers a clean-room implementation of many of Amazon’s most popular services  TopStack runs on private clouds as well as third party public clouds  The 2013-Q3 focus for TopStack is to act as a complement for OpenStack.  TopStack is available in both a Community Edition (open source) and an Enterprise Edition (commercial license & support).
  • 4.
    4 Source Code Organization& Control  Source code for TopStack is stored in Git, open source is in github  Each service is stored in a separate repo  Common repo: ToughCore, for shared utility code  Common repo: ToughResources, for shared static assets  All repos have a master branch for current production code  Tag is applied for each production release  All repos have at least one development branch for current developmentAdditional feature branches are created for feature development, as needed
  • 5.
    5 Build Management andQuality Assurance  Services are built with Ant build files, with Maven tasks for dependency resolution  Dependencies are resolved through local file copies (in dev mode)  Dependencies are resolved through Jenkins artifacts (in build mode)  Builds are managed through Jenkins continuous integration  All service include unit tests that run with each build  Services are deployed as part of continuous integration to Dev clouds  Post deploy, Java integration tests are performed against fresh deploy  Any failed integration tests will cause the build to be marked broken  Once a day, a “long running” set of integration tests are run  Long running tests spin up instances and test advanced connectivity
  • 6.
    6 Continuous Deployment  Continuousdeployment is performed by Jenkins, with jobs deploying to Dev, etc.  Deployments pushed to multiple cloud platforms, versions, … Cloud 1 Cloud 2
  • 7.
    7 Installation & Deployment Installation package is a single package file (.tar.gz), output from continuous build  Unpacked, install package consists of:  Master installation shell script  Install guide (PDF)  Packaged services, to be deployed by installation script  Base Image configuration script  Installation may be re-run as needed to install/configure additional instances  Options to installation allow the installer to include/exclude particular services  Required supporting services are always installed
  • 8.
    8 Deployment  Current testeddeployment configuration:  OpenStack Grizzly or greater (older versions work, but are not commercially supported)  nova-compute with libvirt+KVM, libvirt+QEMU, libvirt+XenServer  nova-volume/Cinder, any iSCSCI backend  nova-network/Quantum, single VLAN  Linux VM, Ubuntu 12.04 or greater
  • 9.
    9 Services Offered toCustomers  Elastic Load Balancer  Route 53  Relational Database Service  ElastiCache  Simple Queue Service  CloudWatch  CloudFormation  Elastic Beanstalk  Auto Scale This deck wont cover these in any detail
  • 10.
    10 Internal Services &Components  Internal Services (daemons):  Service registry & configuration  Orchestration & events  Job Scheduling  Common Components:  Common logging  Persistence  Instance configuration (Chef)  Authorization & access control  Quotas & metering  …  Cloud platform bindings  Instrumentation  Administration  Inter-Service Communication
  • 11.
  • 12.
    12 Cloud Image Repo TopStackMaster VM Tomcat 7 Deployment Model - Evaluation/TopStack Lite 12 TopStack SLB DNS53 SQS CloudWatch RDS Other TopStack Services Apache 2 StackStudio Chef Server PubSub Queue TS Base Image TopStack DNS53 MySQL
  • 13.
    13 TopStack SLB VM TopStackService VM2 Deployment Model - TopStack Enterprise 13 TopStack SLB TopStack Service VM1 Tomcat 7 DNS53 SQS CloudWatch RDS Other TopStack StackStudio VM Apache 2 StackStudio Cloud Image Repo Chef VM Chef Server [Optional] DB VM MySQL Queue VM PubSub Queue DNS53 VM DNS53 TS Base Image
  • 14.
    14 Chef VM TopStack ServiceVMn Deployment Model - TopStack HA 14 TopStack Service VM2 StackStudio VM Apache 2 StackStudio TopStack Service VM1 TopStack ELB VM2TopStack ELB VM1 Cloud Image Repo [Optional] DB Active MySQL DB Standby MySQL Chef Cluster Chef Server Chef VM Queue Cluster PubSub Queue Chef VM DNS53 Cluster DNS53 TS Base Image
  • 15.
  • 16.
    16 Service Registration &Configuration  All services must register with DNS53 on startup  DNS53 maintains private zone for Transcend internal use  Installation creates addresses for TopStack hosts  Registration creates CNAMEs for individual services in DNS  DNS information is used by Transcend load balancer to direct traffic
  • 17.
    TopStack ServiceSLB Request Handler Thread 17 Orchestration& Events Request Handler Thread Open Transaction IaaS Provider Create WF Client Request Response Commit Transaction TopStack Workflow Cloud Op Task Notify Task Complete RDS Work CF Work SLB Work Workflow Step 1 Workflow Error State Quartz Open Transaction Commit Transaction Cloud Op Task Workflow Step 2 Rollback Resources Continuation Request ID Cache
  • 18.
    18 Orchestration & Events Services only own workflow steps and a light servlet for request/response  Pub-sub mechanism between TopStack API front end and service workers  ZeroMQ (http://www.zeromq.org)  Protocol Buffers as serialization format for ZeroMQ  Workflow solution to handle multiple asynchronous service steps:  Mule ESB (http://www.mulesoft.org/)  Asynchronous requests from HTTP handlers  Tomcat 7 with Servlet 3.0 asynchronous servlets (continuations)  Request IDs to marry asynchronous responses to requests
  • 19.
    19 Workflow 19  Many servicesconsist of multiple operations, both synchronous and asynchronous  For example, a Relational Database is created:  An instance must be spun up  Volume is created (in parallel)  Public IP must be associated  Instance startup is complete  Volume is attached  Database installation is performed  etc.  Any workflow step may fail, in which case:  Allocated resources must be torn down, freed  Failure must be reported, handled appropriately
  • 20.
    20 Job Scheduling  Scheduledjobs are executed using Quartz Enterprise Job Scheduler  Quartz runs in clustered configuration  Jobs are executable by any TopStack instance  Scheduled jobs are stored in relational DB  Services may add new jobs to be executed during e.g. maintenance windows  Quartz is a source of workflow jobs  For example, on setting RDS maintenance window, a Quartz job is created  When Quartz job fires, RDS code is invoked to submit workflow
  • 21.
  • 22.
    22 Common Logging  Loggingfrom all TopStack services is performed through SLF4J library  Logging implementation is typically Log4J  Logging may be directed to syslog (including TCP) or simple files  Configuration provides opportunity for aggregation, mining
  • 23.
    23 Authorization & AccessControl  Each TopStack account will require an active IaaS cloud credential set  IaaS credentials are encrypted at rest  Actions are performed using credentials associated with TopStack account  IaaS authorization and access limits define TopStack limits
  • 24.
    24 Instance Configuration  ChefServer  Deployment includes an embedded Chef server (http://www.opscode.com/chef)  Embedded Chef includes a set of Transcend recipes to build up resources  Chef Client  Transcend Base Image burns a Chef client into the image  As new instances are started by TopStack, a Chef configuration and role are injected  Instances dial-back to TopStack as the final step of configuration to become ready
  • 25.
    25 Persistence  Configuration andevent data is stored in a relational database (default MySQL)  Data access is through a DAO layer and Hibernate, an O/R mapping layer
  • 26.
    26 Cloud Platforms Bindings TopStack configuration requires cloud “flavor” as input; OpenStack, Eucalyptus, etc.  IaaS cloud must provide the core operations used by TopStack (or equivalents):  Create/Terminate VM Instance  Allocate/Release IP Address  Associate/Disassociate IP Address  Describe Instances  Create/Delete Security Group  Describe Security Groups  Authorize/Revoke Security Ingress  Create/Delete Volume  Describe Volume
  • 27.
    27 Quotas & Metering All quotas enforced by IaaS provider apply to TopStack instances as well  Some quota is consumed by TopStack constructs that map to quota items  E.g., RDS security group consumes an IaaS security group
  • 28.
    28 Instrumentation  All TopStackhosts are monitored as CloudWatch instances  Installation process configures hosts  Metrics are available though normal CloudWatch APIs  All TopStack service hosts expose basic management information  All hosted services are available, with service status  Service workers (workflow steps) maintain “health” information  Count of tasks processed  Count of task with abnormal outcome  Transactions processed per second  Collected via metrics (http://metrics.codahale.com/)
  • 29.
    29 Administration  TopStack EnterpriseEdition provides an Administration Console  Console runs on each TopStack host  Allows central administration of services  Allows provisioning of user accounts  Provides information on active services, failure rates, scheduled jobs
  • 30.
    30 Inter-Service Communication  TopStackservices communicate with each other only as workflow steps  Subsequent workflow steps are routed through Pub/Sub queue  Loosely coupled, via workflow
  • 31.
  • 32.
    32 High Availability  TopStackservice hosts run in parallel on different VMs; scale-out architecture  VMs may be removed from service & load will redistribute across remaining instances  Workflows in progress will be continued by other instances  TopStack persistence tier may be run in master/slave or cluster configuration
  • 33.
    33 Scalability  TopStack hostmachines can run any or all TopStack services  TopStack endpoints are load balanced across available service hosts  Many service hosts can run in an environment; new hosts register services on start  TopStack persistence tier scales vertically to support large transaction volumes
  • 34.
    34 Portability  The Daseincross-cloud library allows TopStack to operate against the most popular clouds  TopStack assumes only core IaaS services are available  Most clouds provide core IaaS services, or services which may be mapped to IaaS
  • 35.
    35 Security  TopStack servicesare secured with access key and a secret key/password  Optionally, customer can add HSM for increased security  Secret key/password is not transmitted without encryption  Enterprise Edition provides additional OS level lock-downs (PCI DSS)