Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Large-scale Infrastructure Automation at Verizon

1,665 views

Published on

As a company, Verizon networks and infrastructure touch nearly 70% of global internet traffic every single day. The many datacenters that support this - and many other large-scale Verizon services - are our lifeblood. This talk provides a glimpse into the work being done to reimagine the way in which we design and operate the software that runs our internal computing grids, and how we enable a large body of development staff to ship jobs and services to the grid every single day. We’ll cover how Consul and Vault make for invaluable building blocks in modern distributed systems, and highlight the importance of empowering teams through well designed infrastructure systems.

Published in: Technology

Large-scale Infrastructure Automation at Verizon

  1. 1. LARGE-SCALE INFRASTRUCTURE AUTOMATION Timothy Perrett Hashiconf 2016
  2. 2. HELLO.
  3. 3. STATE OF THE UNION INDUSTRY ^
  4. 4. PROLIFERATION OF RESOURCE MANAGEMENT
  5. 5. LINUX CONTAINERS ARE FINALLY POPULAR
  6. 6. NOSQL EVERYWHERE
  7. 7. NOSQL EVERYWHERE with RDMS 
 still commonplace ^
  8. 8. OBSERVATIONS.
  9. 9. “Infrastructure engineering is 60% social, and only 40% technical. Changing people is far 
 more important than changing technology.
  10. 10. “Enable sociological change. Technological changes are an implementation detail.
  11. 11. “Operational complexity is often proportional 
 to the lack of developer responsibility.
  12. 12. FLEXIBILITY CONSTRAINT Business 
 Staff
  13. 13. FLEXIBILITY CONSTRAINT Engineering
 Staff
  14. 14. “Constraints liberate. Liberties constrain. - Runar Bjarnason
  15. 15. BRIEF HISTORY We have to go back!
  16. 16. 4 YEARS AGO.
  17. 17. 4 YEARS AGO. Pretty typical configuration management. Centralized Chef servers. Lots of unmaintainable Ruby. Ruby that generates Ruby which is evaluated at runtime (yikes!). Developer contract is non-existent. Operations need to understand every application in detail. Code complete to finally deployed took around two weeks.
  18. 18. 3 YEARS AGO.
  19. 19. 3 YEARS AGO. Implemented immutable machine images with Hashicorp Packer. Developer / Ops contract becomes an RPM/ DEB file along with two YAML manifests. One manifest for provisioning. Another for runtime deployment setup. Drive the entire release workflow from source repositories. Orchestrated with many linked Jenkins jobs and schedules. Code complete to finally deployed took around 40 minutes.
  20. 20. TODAY.
  21. 21. TODAY. Developer / operations contract is just a linux container. Repository contains a YAML manifest. Realization that placement and orchestration are entirely separate. Intelligent and fully automated cleanup. Application dependency management. Automated traffic bleeding. Integrated alerting with prometheus, general notifications with slack or email. Code complete to deployed takes 
 around 5 minutes.
  22. 22. NELSON.
  23. 23. “Desperate affairs require desperate remedies. -Vice Admiral Horatio Nelson, 1758-1805
  24. 24. GOALS.
  25. 25. GOALS. System elements should be awesome at just one thing. Reduce system complexity by increasing responsibility of engineering teams. Break it, you bought it. All application specifications are checked into source control. Focus on orchestration, not placement. Force automation in every aspect of work Manual access to systems are a crutch that enables automation avoidance.
  26. 26. UNITS.
  27. 27. job
  28. 28. - name: hello world type: job description: > mindlessly prints hello world to the console for five minutes schedule: hourly retries: 2 expiration_policy: >
 retain-latest-two-major dependencies: - ref: example@3.1 unit type job stuff
  29. 29. service
  30. 30. - name: howdy type: service description: > always responds with 
 hello world ports: - default->8080/http expiration_policy: >
 retain-latest-two-major dependencies: - ref: foobar@3.1 unit type service stuff
  31. 31. edge
 proxy
  32. 32. - name: foobar-proxy type: proxy description: > proxy inbound from outside routes: - name: expose the ssl port expose: inbound->443/https destination: edge@1.3->default expiration_policy: > retain-until-deprecated routes
  33. 33. WORKFLOWS.
  34. 34. failure
 domain
  35. 35. container
 replication
  36. 36. alerting
 routing
 discovery
  37. 37. scheduling
  38. 38. credentials
  39. 39. LIFECYCLE.
  40. 40. User 
 activated
  41. 41. Graph 
 Pruning
  42. 42. Upgraded!
  43. 43. Automatically Terminated X X
  44. 44. LIFECYCLE. Various cleanup strategies Graph pruning Explicit deprecation cycles User selected policies for versions Retain last two major Retain last two minor Retain latest Retain always Eliminates the “Do we still need this?” conversations between ops and development. - name: hello world type: job description: > mindlessly prints hello world to the console for five minutes schedule: hourly retries: 2 expiration_policy: >
 retain-latest-two-major dependencies: - ref: example@3.1
  45. 45. TL;DR. Automate everything. Your future sanity depends on it. Define concrete protocols at system integration points; favor machine verifiable protocols where possible. Your path to success involves people. Listen, learn and be open for criticism. Consul & Vault provide building-block functionality that just works. Never settle for mediocre tools. Know when buying is better than building, but don’t be afraid to build 
 if it adds value.
  46. 46. EOF WE’RE HIRING! timperrett github.com/verizon

×