The tools and technologies used to power the modern data center are evolving at a pace faster than most companies can keep up. Aging web services built on LAMP, WAMP, or ASP cannot readily take advantage of the latest in scalable web platforms and technologies. In this presentation, we will discuss what factors must be considered in order for your aging web service to take advantage of technologies such as Apache Mesos, Marathon, Docker, Apache Kafka, and more.
This talk is intended for software developers, operations, and IT managers who are looking to modernize existing privately-hosted web applications. We will look at the transformation of the data center from a high-level perspective, examining before and after topology examples using Key Performance Indicators and Key Performance Metrics to show how levering modern design principles can both improve application performance and reduce operational costs. Next we will look at some example applications and show what needs to be done from both the software development and infrastructure perspectives to successfully accomplish the transformation.
2. Scope
• Provide a model for understanding how to leverage emerging
technologies to improve the operation of your data center.
• Determine the real business value of this model.
• Examine the work required to migrate a traditional web service to
this model.
• Technical Notes
• Single private data center deployment
• Content delivery services only, no “big data”
• Linux servers only
• All services are internally stateless
• Server-level fault-tolerance only. Data center failover and high-availability
networking are great subjects for other talks.
3. Definitions
• Server: Physical compute hardware, with or without a hypervisor.
• Environment: Everything necessary to allow a service to execute.
There are three main types:
• Bare Metal: server->OS->ecosystem->service
• Virtualized: server->hypervisor->VM->OS->ecosystem->service
• Container: server->OS->ecosystem->container->service
• VM:An instance of an operating system running within logically-
defined system resource and network specifications.
• Ecosystem:The set of tools, utilities, and agents running on aVM
necessary to support a service.
• Service:A process that performs a discrete unit of functionality
which is considered the primary unique function or purpose of the
environment.
4. Definitions: Types of Services
• Application Service:A service that directly contributes to the
operation of the customer-facing product.
• Web services, web servers, databases, identity management, data processing,
ETL, etc.
• Platform Service: A service that supports business-level services.
• Block storage, load balancer, message queue
• Infrastructure Service: A service that provides administrative
operational functionality only.These services have no impact on the
functionality of the customer-facing product.
• Performance monitoring, log aggregation, VM provisioning, service
orchestration, network management
• Developer Service: An application which runs in it's own environment
provides functionality supporting the process of building and
deploying source code.
• Jenkins, Capistrano, git, Artifactory
5. Designing For Scale:
Reference Architecture
Service
Container
Repository
Server 1
/user/login
Servlet
/messages/getRecent
Servlet
Monitoring
Agent
Server 2
/user/login
Servlet
/messages/getRecent
Servlet
Monitoring
Agent
Message Database
Server X
Monitoring
Service
Bus Adapter
Identity
Management
Legend
Red Lines = Public / DMZ
White Lines = Backbone VLAN
Solid White Line = Message Bus
Yellow Lines = Control VLAN
Cloud Controller
Internet
6. Designing For Scale: Reference Architecture
• VLAN segmentation for intrusion prevention
• Public/DMZVLAN isolates the services which are first to handle incoming
requests.
• BackboneVLAN isolates the supporting application services from Internet
requests.
• ManagementVLAN isolates management functions, such as service provisioning
from everything else. Since thisVLAN doesn’t have the same bandwidth
requirements as the other two, it is often physically separated by connecting it
to a lower bandwidth switch.
7. Designing For Scale: Reference Architecture
• “Micro-Services”
• Traditional web service implementations group all service method
implementations into a single unit.
• This unit represents the smallest unit of service scalability.
• Regardless of whether you are planning for capacity manually or via a dynamic
provisioning system, you must characterize the resource utilization of each
unique functional unit under progressively increasing load.
• Since no two web service methods consume exactly the same resources - and
some may differ dramatically – you must assume the worst-case scenario.
• This results in over-estimated resource needs, additional complexity, and
underutilized resources.
• Micro-services is a term for services which have been reduced to their smallest
discrete functional level. In the case of web services, it is usually a single web
service method.
8. Designing For Scale: Reference Architecture
• Service Bus
• Provides connection-level decoupling which is essential for automating the
deployment of entire systems of services (orchestration).
• Establishes a common, consistent messaging mechanism for:
• Horizontal scaling of services.
• Minimizing impact of failed servers.
• Parallel processing of requests.
• Decoupling makes it much easier to radically change the implementation of an
individual service.
• Most services which do not support a bus natively can be fitted with adapters.
9. Designing For Scale: Reference Architecture
• Server resource monitoring
• Resource monitoring is the cornerstone of any dynamic of elastically scalable
system.
• Agents installed alongside all services report resource utilization metrics to a
central server.
• The central server can be configured with thresholds which trigger alerts.
• Alerts may be handled by automation or signal an administrator or both.
10. Designing For Scale: Reference Architecture
• ApplicationContainers
• In the beginning, services were hosted on dedicated servers.
• In an effort to reduce hardware and maintenance costs, virtualization was invented.
• Virtualization allowed administrators to condense many dedicated servers down to
one by running exact images of those servers on bigger, but oversubscribed, hardware.
• Virtualization maximized flexibility by permitting any combination of operating
systems to run in complete isolation from one another.
• This appealed to administrators who had to support systems written by blind
stuntmen from outer space who couldn’t seem to agree on anything.
• However, in situations where a general consensus could be reached about which
operating system and supporting ecosystem to use, this scheme was a horrible waste.
• Application containers provide isolation at the process level in order to avoid the
overhead of instantiating copies of the OS and ecosystem needlessly.
11. Designing For Scale:
The Evolution of Application Stacks
Server Server
Operating System
Operating System
Operating SystemHypervisor
Server
Ecosystem
Service
Virtual Machine Virtual Machine
Operating System
Ecosystem
EcosystemEcosystem
Container Container
Service Service
Service Service
Dedicated
Servers
Virtualization Containers
12. Designing For Scale: Reference Architecture
• Cloud Controller
• Tracks server resource allocation (not utilization)
• Maintains inventory of service container deployment.
• Once deployed, containers reserve resources (CPU, memory, network) as stated in their manifest.
• Records total available resources for each server when they join the cloud.
• Tracks how much of each type of resource each server has available.
• Receives alerts from performance monitoring system.
• Provisions service containers to the server with the most resources.
• ServiceContainer Repository
• Maintains “Golden Masters” of each containerized service
• To containerize a service:
• Create a manifest which describes the services resource needs.
• Use your chosen container utility to package it.
13. Designing For Scale: Reference Architecture
• Load Balancer
• Point of entry for all incoming requests
• REST-aware service pooling
• Map URL path (method) to pool of micro-service instances.
• Resource-aware load balancing
• Send request to service instance with the most available resources.
14. Legacy Application:
Example Architecture
VM / OS
VM / OS
Virtual
Machine
Repository
Server 1
Ecosystem
Web Service
Message Database
Identity
Management
Legend
Red Lines = Public / DMZ
White Lines = Backbone VLAN
Solid White Line = Message Bus
Yellow Lines = Control VLAN
Cloud Controller
VM / OS
VM / OS
Server 2
Ecosystem
Web Service
User Metrics
15. Legacy Application: Compare & Contrast
• Capacity planning
• Legacy: Stress test every method in the interface of each service until it cannot
return a response within the acceptable range.
• The first method to fail determines the maximum number of concurrent requests an
instance can handle.
• The method which uses the most resources just before it fails determines the resource
utilization metrics for the service.
• Develop a theoretical model of the anticipated customer demand for your service.
• Divide the maximum number of concurrent requests from the model by the number of
concurrent requests a service instance can handle and multiply by the resources
required.
• Repeat for all services
• New: Let the platform identify the weakest link and just focus on that.
• Deploy full service environment on existing, leased, or public cloud infrastructure.
• Only stress the externalAPI.
• Observe which service the cloud controller dynamically creates the most of.
• Provision for the needs of the service while working on improving it.
16. Legacy Application: Compare & Contrast
• Resource Utilization
• Legacy: Allocate resources toVMs
• One service perVM
• Resource utilization for each service instance includes OS and ecosystem
• New: Dynamic provisioning of micro-services
• Define essential unit of functionality
• Unlimited number of service instances can share resource pool
• Resource utilization for each service instance includes almost no overhead
17. Legacy Application: Compare & Contrast
• FaultTolerance
• Legacy: It’s all very special
• With the exception of the externally-facing web services, the fault-tolerance
mechanism is undefined.
• Therefore, each service may, or may not, implement a proprietary mechanism for load
balancing and/or fault tolerance.
• Each service may require unique administrative tasks to be performed to accomplish
redundancy.
• New: Built into the platform
• The platform mandates a common strategy for horizontal scaling, load balancing, fault
tolerance, and parallelism
• The platform provides all the necessary tools and utilities services need.
• Scalability and redundancy logic is common across all services.
18. Legacy Application: Compare & Contrast
• Configuring IPC connections during automated provisioning
• Legacy:This is all very special too
• Each service defines where it keeps IPC connection configuration information.
• Complex service orchestration logic is needed to determine the upstream impact of
adding a new service instance or changing the location of a service instance.
• Each service requires custom logic to update the connection configuration and also
likely need to restart the service.
• New: No direct connections
• All backbone communications go through a service bus.
• Downstream services subscribe to their associated bus, queue, or “topic”
• Upstream services publish generic requests to the intended recipient’s topic
• Downstream services publish responses to the sender’s topic.
• The service bus is controlled by a “broker” which reliably handles all scalability, load
balancing, and fault-tolerance activities.
19. Business Value of Adopting the New Model
• Maximize server resource utilization
• Reduce hardware, network, software, support and electricity costs.
• Simplify operations and reduce maintenance work.
• Minimize impact of hardware failures
• Improve customer satisfaction
• Increase value of service to customers
• Avoid non-compliance with availability SLAs
• Dynamically respond to changes in customer usage patterns or
volume
• Minimize response time of customer requests
• Avoid non-compliance with scalability SLAs
• Avoid wasting resources due to over-provisioning low-volume services
20. Bridging the Gap: Design Principles
• Homogeneity
• Operating System
• Ecosystem
• One tool for each purpose.
• Services may depend on different versions of static libraries.
• IPC over ESB
• Native support for busses
• Keep the messages as generic as possible
• Micro-services
• Determine minimum unit of discrete functionality.
• Each service should perform exactly one function.
21. Bridging the Gap: Tools
• Docker
• Service container utility
• Container definition
• Packaging
• Local repository
• Local provisioning
• Execution
22. Bridging the Gap: Tools
• Apache Kafka
• Reliable messaging system / bus
• Modes
• Publish-Subscribe
• Load balanced
• Hybrid
• Each service type is defined as a “topic”
• Multiple instances of a service can be grouped for load balancing and fault
tolerance
• Multiple groups can subscribe to a topic for parallel processing
• Some instance-specific configuration would be needed to ensure that each
service is doing unique work.
23. Bridging the Gap: Tools
• Apache Mesos
• This is the Load Balancer in the reference architecture
• Abstracts hardware into a pool of server resources.
• Allows long-running services and batch processes to share the same resource
pool.
• Routes incoming requests to the appropriate service type (task).
• ApacheAurora
• Extends Mesos by managing the provisioning of multiple instances (tasks) of
each service type (job).
• This additional context enables resource-based routing.
• Marathon
• This is the Cloud Controller in the reference architecture
• Provisions services and manages resource allocation
24. Related Tools For Specific Situations
• Spark
• Batch processing of data
• Like a dynamically scalable Hadoop
• Storm
• Parallel processing of data streams
• Chronos
• It’s cron on Mesos
• It eats ETL jobs for lunch
The Apache Mesos Ecosystem
25. The Future:
Where No Data Center Has Gone Before
• The Data Center Computer
• Your laptop doesn’t ask you which CPU core you would like to run Chrome on, so
why should your data center?
• The principles and tools described in this presentation provide the essential
building blocks for turning your data center into one big task execution platform.
• Rather than waste time assigning processes to servers, why can’t I just log into a
control console (or issue an API command) to start a new service or update an
existing one?
• This is the explicit goal of projects like Mesos.
• Significant challenges are impeding the release of any general-purpose turn-key
solution. But maybe in your situation with the kinds of services and operational
scope you work with, it may be possible.