Intended for customers who have (or will have) thousands of instances on AWS, this session is about reducing the complexity of managing costs for these large fleets so they run efficiently. Attendees will learn about common roadblocks that prevent large customers from cost optimizing, tools they can use to efficiently remove those roadblocks, and techniques to monitor their rate of cost optimization. The session will include a case study that will talk in detail about the millions of dollars saved using these techniques. Customers will learn about a range of templates they can use to quickly implement these techniques, and also partners who can help them implement these templates.
2. What to expect….
We will introduce our approach for building
the business case for moving to the cloud
and share tips from some of our most
innovative customers who are able to
successfully architect for cost optimization
in order to realize the economics of the
AWS Cloud.
4. What is TCO?
Definition: Comparative total cost of ownership analysis (acquisition
and operating costs) for running an infrastructure environment end-to-end
on-premises vs. on AWS.
Used for:
1) Comparing the costs of running an entire infrastructure environment or
specific workload on-premises or in a co-location facility vs. on AWS
2) Budgeting and building the business case for moving to AWS
6. TCO = acquisition costs + operations costs
Hardware—server, rack
chassis PDUs, Tor switches
(+maintenance)
Software—OS,
virtualization licenses
(+maintenance)
Facilities cost
Hardware—storage disks,
SAN/FC switches
Storage admin costs
Network hardware—LAN
switches, load balancer
bandwidth costs
Network admin costs
Server admin virtualization admin4
The diagram doesn’t include every cost item. For example, software costs can include database,
management, and middle-tier software costs. Facilities cost can include costs associated with upgrades,
maintenance, building security, taxes, and so on. IT labor costs can include security admin and application
admin costs.
Space Power Cooling
Facilities cost
Space Power Cooling
Facilities cost
Space Power Cooling
Server costs
Storage costs
Network costs
IT labor costs
1
2
3
illustrative
7. Resources to get you started
AWS TCO Calculator
https://awstcocalculator.com
Case studies and research
http://aws.amazon.com/economics/
8. What’s missing from a traditional TCO?
Economic Criteria Category
Infrastructure Comparison
Capacity Planning Benefits
Financial Benefits of Innovation
Cost Avoidance
Workforce Productivity
Accelerated Time To Value/Market
Cost to Achieve (Migration, Platform, Training)
Legacy Constraints
Included Partially Included Not Included
14. The four pillars of cost optimization
Right-sizing Reserved
Instances
Increase
elasticity
Measure,
monitor, and
improve
15. Right-sizing
Right-sizing
• Selecting the cheapest instance available
while meeting performance requirements
• Looking at CPU, RAM, storage, and network
utilization to identify potential instances that
can be downsized
• Leveraging Amazon CloudWatch metrics and
setting up custom RAM metrics
Rule of thumb: Right size, then reserve.
(But if you’re in a pinch, reserve first.)
16. Reserved Instances
Commitment level
1 year
3 year
AWS services offering RIs
Amazon EC2
Amazon RDS
Amazon DynamoDB
Amazon Redshift
Amazon ElastiCache
* Dependent on specific AWS service, size/type, and region
17. Reserved Instances
Step 1: RI Coverage
• Cover always-on resources.
Step 2: RI Utilization
• Leverage RI flexibility to increase utilization.
• Merge and split RIs as needed.
Rule of thumb: Target 70–80% always-on
coverage and 95% RI utilization rate.
18. Increase elasticity
Turn off nonproduction instances
• Look for dev/test, nonproduction instances that
are running always-on and turn them off.
Autoscale production
• Use Auto Scaling to scale up and down based
on demand and usage (for example, spikes).
Rule of thumb: Shoot for 20–30% of Amazon EC2
instances running on demand to be able to
handle elasticity needs.
19. Using right-sizing and elasticity to lower cost
More smaller instances vs. fewer larger instances
29 m4.large @ $0.12 /hr
$2,505.60 / mo*
59 t2.medium @ $0.052/hr
$2,208.96 / mo*
*Assumes Linux instances in the US-East (N. Virginia) Region at 720 hours
per month
38. Today
What’s changed?
Infrastructure ownership increasingly distributed to individual business
groups
• Increasing control over infrastructure
• Go “rogue” to achieve business agility
• Not always taking into account importance of governance,
compliance, risk management
IT increasingly influencer / auditor instead of owner
39. Active policy management and automation
Implement
Governance and
Automation
Define policies to manage
cost, usage, performance,
and security to optimize
and empower LOBs
through centralized
governance
Improve
Reservation
Management
Model, manage, and
modify reservations, and
customize
recommendations based
on budget.
Optimize
Performance
Right-sizing
Leverage integrated data
sources, define policies,
and improve resource
utilization by workload
Simplify Management
Establish guidelines, automate tasks, and provide recommendations
40. Real-world example
The devil is in the details
Challenge:
How to manage and optimize infrastructure performance for thousands of instances
and more than 10 PB of Amazon S3 storage that supports a global real estate
marketplace?
Needs:
• Consolidated view across all infrastructure
• Eliminate zombie and underutilized assets
• Usage analysis tied to business groups
• Performance analysis by workload, department, and business groups
• More easily leverage reservations
41. Solution: CloudHealth Cost Summary, RI Management and Health Check Pulse
reports, and performance right-sizing paired with integrated Datadog metrics for
recommendations on where to downgrade and/or optimize infrastructure.
Solution and results
More than 30%
improved
performance
Benefits realized from
EC2 right-sizing!
42. Real-world example
What’s in your cloud…$$$$
Challenge:
How to manage extremely dynamic infrastructure with hundreds of EC2 instances and
better leverage reservations?
Needs:
• Visibility across an extremely dynamic and complex infrastructure
• Track and report resource utilization by business groups
• Right-size underutilized instances
• Model, modify, and purchase reservations to improve utilization
43. Solution: CloudHealth delivered visibility across all with functional business
group reporting that aligned infrastructure costs with usage and performance.
The analysis simplified reporting on reservation utilization and provided
recommendations for reservation modifications and purchases.
Solution and results
More than 150
employee hours &
$75,000
Benefits realized from RI
modifications alone!
44. Real-world example
Automation to simplifying cloud service management
Challenge:
How to promote agility across individual business groups while also implementing
centralized governance?
Needs:
• More efficiently utilize resources
• Terminate unused volumes
• Create rules to evaluate criteria for cost savings opportunities and reservation
modifications
• Minimize infrastructure usage during non-working hours
• Improve infrastructure deployment compliance - names and tags
45. Solution: Implemented CloudHealth Automated Tasks via policies for
notification, termination, and start/stop of non-compliant infrastructure
deployment, budget overages, weekend usage lights-out rules, and
reservation modifications.
Solution and results
More than 21%
The amount saved in the
last year!
46. Managing the cloud for improved efficiency
Visibility from integrated reporting
Optimization through integrated recommendations
Governance with active policy management
48. Learn more about how Cloud Service
Management can help you!
Download our free eBook:
Cloud Service Management
For Smart People
http://bit.ly/AWSsantaclara2016
Stop by Booth #615