These are the slides from my talk at the NFHS Summer Meet 2016. The talk (and these slides) attempted to attack the conventional wisdom that owning servers is a worth-while endeavor for companies and organizations that create/maintain sophisticated web and mobile operations. AWS case studies were offered as evidence but the talk tried to be provider-neutral and encouraged newcomers to look at a broad range of IaaS and PaaS providers.
2. “I choose a lazy person to do a hard job.
Because a lazy person will find an easy way to do it.”
-- Bill Gates
3. This is a talk about how our company
used the cloud to systematically
eliminate unnecessary work in order
to get more done and accomplish
our goals.
4. Hi, my name is Luke Chavers
Chief Technology Officer
C2C Schools, LLC
www.c2cschools.com
( luke@c2cschools.com )
5. Software Development Company
In business since 2007
Tracks 2.7 million athletes
Has 84,000 users
Managed 760,000 games/events
Creates web and mobile apps
Focuses on HS/MS Athletics
7. What are the things that you spend most of your time on?
Questions for your consideration …
How many of those directly relate to your organization’s core goals?
How many of those things do you enjoy doing?
Do you really have to do those things to succeed?
8. Here’s where it started for us …
As a software development company …
Why do we spend so much time and money on our infrastructure?
Why do we need as many system admins as programmers?
Why is every software ambition limited by our infrastructure?
9. A few distinctions …
These questions are not universally applicable. We are also not a
sanitation company, but we still need to take out the trash. There are
a few important points that make this situation different …
We don’t have a problem with employees gathering at the dumpster.
We don’t need to devote full office wings to trash mitigation.
The costs do not necessitate waste disposal provisions in contracts.
10. But, with our infrastructure …
Employees are constantly gathering at the IT desks.
The footprint of system ops is bigger than any other dept.
We spend as much on systems as we do on software development.
11. ... yet, we cannot invoice for any of it.
Our customers are paying for the software and understand the line
items and contract provisions related to development; but we’re
expected to factor and absorb the infrastructure expenses.
This is not intuitive or sustainable.
We had to find a way to simplify
We had to find a way to do less.
12. Doing less means …
Eliminating anything that you’re not trying to be the best at doing.
If you’re not …
… known for it …
… charging (directly) for it …
… interested in competing in its market …
… then you should try to avoid doing it.
13. Doing less means …
Assigning real value to your time.
Your organization spends more than just your hourly rate when you
take something on. The hidden expenses surrounding every effort are
nearly incalculable. Before you say “we can just handle that,
ourselves”, try to consider the true costs.
14. Doing less means …
Assigning real value to complexity
To one degree or another, each unit of complexity you add to your
organization will add burden to, practically, every part of your
organization. Very few things exist in isolation and complexity
creates debt; you’ll pay that debt, with interest.
15. We’ll always need systems;
the cloud doesn’t change that.
The cloud is just a tool that can
make systems much easier.
16. Wielding it properly requires you to
fundamentally change how you
approach most problems.
One catch …
17. The Cloud
A “buzz word” that the world has nearly
destroyed through over-use,
it generally refers to a new computing
model that advocates
“shared tenancy” and per-use pricing.
... however ...
19. Still, it sort of
looks like a cloud.
“Swarm” would’ve
been better, but
marketing wouldn’t
sign off.
20. Put down your cable crimpers and sell
your servers. Your new home is more
akin to a hotel than a house.
The Big Idea
21. AWS
Amazon Web Services
One of the world’s first and largest cloud providers.
Same people who run Amazon.com (it’s an interesting story)
Definitely not the only provider; there are dozens of great companies
We use them most often; so that’s what we’ll talk about today
22. Use Case 1
AWS RDS
“Unanticipated Pleasantries”
A tale of
23. The Perceived Problem
Users and QA staff began reporting debilitating page load times;
customers were unable to use our software effectively.
Panic Ensues
24. The Actual Problem
Our primary database server is bursting to 100% CPU for several
minutes at a time. We must upgrade it, immediately.
Panic Spreads to Nearby Offices
25. Primary Goal
Migrate Primary SQL Server to
a new Dell PowerEdge Server.
Secondary Goals
Minimize Down-Time
Minimize Cost
Have a contingency plan
26. The Plan
Migrate all data to AWS “Relational Database Service” (RDS)
Mail the new server to Atlanta
Mail our systems admin to Atlanta
Remove old server; install new server; mail old server to office
Migrate all data from AWS RDS to new server
Buy a new server from Dell
27. What really happened …
Migrated all data to AWS “Relational Database Service” (RDS)
Checked our AWS bill
Someone said: “Wait … that can’t be right …”
Tested page load times
Repurposed the new Dell server
Bought a new server from Dell
28. What we learned …
AWS RDS is significantly cheaper than colocation
It, apparently, costs less to click “New Server” vs “Book Flight”
The folks at AWS RDS are better at optimizing databases than we are
29. A few things we stopped doing …
Mailing database servers to Atlanta
Buying database administration books/training
Buying database servers from Dell
32. The Perceived Problem
C2C begins negotiations with multiple state organizations; one has an
extraordinary student data importing requirement.
The question comes down: “What will this cost us?”
Panic Ensues
33. The Actual Problem
We don’t know what any compute burden will cost until we turn it on,
but we’re being asked to calculate costs, down to the network cable,
for an extreme, yet theoretical, monster processing problem.
Panic Spreads to Nearby Offices
34. Primary Goal
Calculate the costs of a
huge compute operation.
Secondary Goals
Don’t mess this up
Don’t ask too many questions
Avoid a nervous breakdown
35. The Plan
Compare it to existing data sets
Approximate the compute burden of the example data
Approximate the compute burden for the entire state
Determine every item that needs to be purchased
Research prices for each part
Get an example data set
36. What really happened …
Paced back and forth in doubt and despair
Remembered reading about EC2 “Auto-Scaling Groups”
Ran a processing test on EC2 w/ ASG
Wrote down the processing time; multiplied by number of schools
Submitted the cost estimate
Approximated the compute burden
37. What we learned …
Owning hardware reduces our agility and limits our developers
Per-hour pricing significantly reduces both problems
Large purchases require impractical, cross-discipline, cooperation
38. A few things we stopped doing …
Buying equipment in bulk
Asking the financial department for money
Calculating large hardware purchases
40. Use Case 3
AWS S3
“The Perils of Hoarding”
A story about
41. The Perceived Problem
C2C allows its users to upload high definition videos without limitation;
one day it seemed that everyone realized that, simultaneously.
Panic Ensues
42. The Actual Problem
C2C expected a gradual increase that allowed storage to be
incrementally added. We were rapidly running out of space.
Panic Spreads to Nearby Offices
43. Primary Goal
Install a new file server
Secondary Goals
It needs to be huge
Minimize Cost
Hurry
44. The Plan
Ship it to the datacenter
Fly a systems admin to the datacenter
Install the new server
Live a happy and productive life
Buy a new, 96TB, file server
45. What really happened …
Found that 25% of the drives were faulty
Western Digital’s factory was flooded; replacements delayed
We installed the server with only 2 arrays enabled
AWS notified us that S3 pricing was being reduced by 50%
A few weeks later, S3 prices were slashed by another 50%
Received the new server
46. What really happened …
We never brought the failed arrays online
User video upload rates returned to normal
47. What we learned …
Everything we add to our network reduces its fault tolerance
Predicting user behavior is very hard to do
Physics can be a real drag
You can’t build storage for what AWS is leasing it for
48. A few things we stopped doing …
Trying to compete with AWS
Buying screwdrivers and crimp tools
Buying or building storage appliances and servers
50. Main reasons we feared the cloud:
Price Predictability
Our Jobs Security
51. Use Case 4
AWS Lambda
“Peering into Rabbit Holes”
A story about
52. The Perceived Problem
At a certain point someone introduced a profound idea. If the cloud is
providing solutions that are seemingly better, but that contradict our
classical training, what other fundamentals could we be wrong about?
Panic Ensues
53. The Actual Problem
We’ve always been taught that owning the hardware, tweaking the
database, and running the cables, ourselves, was the only sure way to
ensure things are done properly. After all, our case is special, right?
Panic Spreads to Nearby Offices
54. Primary Goal
Find the most uncomfortable
AWS service and implement it.
Secondary Goals
Realize dramatic cost savings
Reduce system complexity
Reduce maintenance overhead
55. The Plan
Take the old service offline
Bring the new service online
Pay close attention to the metrics
Rebuild a major C2C service (grade processing) using AWS Lambda
56. An interlude …
C2C imports and processes
approximately 20 million student
grades each day.
57. The grade and eligibility
systems, collectively, are an
order of magnitude larger than
any other C2C system.
58. What really happened …
Previously, it took 3x PowerEdge servers 4+ hours.
We could not believe our eyes …
AWS Lambda computed every grade in Mississippi in 15 minutes.
59. What we learned …
We don’t need to do the things we thought we had to do
Things will never be the same
The cloud empowers our developers in unprecedented ways
60. A few things we stopped doing …
Renewing datacenter contracts
Buying servers, routers, switches, appliances, and CAT cables
Fighting against the cloud