Automation and Culture Changes for
40M Subscriber Platform Operation
Yuichiro Sano
Yahoo! JAPAN
ysano@yahoo-corp.jp
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
About me
Yuichiro Sano
• Yahoo! JAPAN Platform Head of Cloud Platform Department
• Responsible for operational support, promotion and management of on-
premises platform (PCF k8s) at Yahoo! JAPAN
2
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
About Yahoo! JAPAN
4
3/03 67
A B HC CB C H H
H BC C M CAD B B H
CF ! F F J M
F B B LD B B C F
B B F B
2
1 ! A FHD CB ! H H B
CH F J F J
BCFAC HF
03 5 51
H 0 B E F !
CB H F C H . D B
DCD H CB F H
6 CC . D B F
+ 6 0 7
B CDH A F H H F
B HC B B J
F H F H CF
FJ H F BH CFA
C B F H C
F F
1 4
+B B F B ,C
J F FJ
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Why Cloud Foundry?
• Needed to modernize systems, reduce operational cost
• Productivity. Needed an environment where Engineers could just focus on
building services and not worry about the rest
• Bosh
• Buildpack model
• Auto-scale needed
• Wanted to leverage existing OpenStack environment
5
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
The Journey to Cloud Foundry
2015 Apr
PoC start
2015 Oct
Pilot
Dev start
2016 Oct
Pilot
release
2017 Apr
PCF GA
(openStack)
2017 Oct
openstack
+1 cluster
Today
Openstack x2
vSphere x4
11,000 AI1,000 AI 3,000 AI
2018 Apr
vSphere
+2 cluster
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Current Cluster Spec
7
Sandbox Development Production
cluster 2 2 6
HV 40 120 360
Diego cell - 300 900
App Instance - 8,000 11,000
Total request/sec - - 600,000
Log traffic log/sec - - 90,000
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Ecosystem around PCF
8
RDB MQ
Splunk RBAC
FaaS
Repository
KVS
Object
Storage
Redis
GitHub
Enterprise
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Organizational Structure
9
System dep.
Platform dep.
Cloud Platform dep.
Infrastructure dep.
other Platform dep. Network IaaSLBother Platform dep.other Platform dep.
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Team Structure
10
Cloud Platform dep.
PaaS CaaS
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Introducing PCF to the Organization
Most people were new to PCF, so we did the following:
• Held company seminars, hands-on workshops for > 1000 developers
• Maintained Japanese language reference material and tutorials
• Provided best practice guidance for various development use-cases
• Used Pivotal consulting services to provide support for platform and SRE
• Created a Service Broker to handle special YJ cookie offload
11
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Growing Pains
• With the addition of clusters the operational burden and alert support
increased
• Along with the increase in #Apps, with insufficient clusters PCF was unable to
accept new apps
• With increasing log volume, our log management system became overloaded
• We found App config mistakes (eg. timeouts) could affect the cluster
(goRouter)
• Time dealing with user support issues made it difficult to introduce stable
operations policies
12
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Clearly Defining the Role of Each Team
13
CRE
SRE
PCF users 2,500 Engineers
PaaS team
Propose efficient usage methods and
proactively resolve issues to ease the
transition for engineers
Platform as a Product. Focus on increasing
system reliability by preventing failure and
promoting automation
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
CRE team
14
Developer Counterpart
l First line support for users
l Contact users in case
application mis-behaves
Developers Education
l Deliver workshops
l Provide default CI/CD
templates
l Best practices
l Applications Architecture
support
Service Integra8ons
l Create service broker
services
l Support service team
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
SRE team
15
CRE counterpart
l Implements CRE needs
l Works closely with CRE for
Capacity Planning
Platform Updates
l PCF Updates
l Add new features around
the platform
l Logging, Metrics for users
l Automate all the Things ….
Platform Stability
l Define SLO, SLA
l Platform monitoring, alerts,
etc...
l Defining alerts, what, when,
how ?
l Capacity planning
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
SRE
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Automate all the Things
17
Install PCF
Backup with BBR
Cluster
Integrity
Update PCF
Prometheus
deployment
Quota, Usage
check
Buildpack Update
Logs forwarder
Deployment
IaaS layer check
(blobstore,...)
Smoke Test
User/Space/Orgs
Management
etc ….
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
PCF Install & Update Pipeline
18
Deploy
bosh
upload
Tile
Install
Tile
1 Day
Deploy
Opsman
Create IaaS
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Buildpack Update Pipeline
19
Dev
Update
staging
Update
Production
Update
Sandbox
Update
Every month
Buildpack x 8
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
PCF Backup
20
every-2am
every-3am
every-4am
every-5am
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
PCF Smoke-Test
21
All environments
Every 10 minutes
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Monitoring all the Things
22
App Instance
Log Traffic
Cell Capacity
Avg response
time
cf push
Duration
Router Traffic
CPU Usage
Cluster
Healthcheck
Probe
Mem Usage
Log missing rate
etc ….
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cluster Dashboard
23
Routetr Rps Routetr Go Routines
Routetr Latency
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cell Capacity
24
Used
Availlable
Used
Availlable
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
App Latency
25
Latency 99%
90%
50%
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Smoke Test
26
Cf Push Duration
Cf Scale Duration
Cf Start Duration
Cf Delete Duration
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
BOSH
Use BOSH for Logging Components
• Alert & App logs are transferred to the platform
• Using BOSH makes it easier to scale the nozzle and relay components
27
Internal
Notifications
App A
App B Splunk
loggregator
Monitor
nozzle
Splunk
nozzle
Monitor
relay
Splunk
relay
easier to scale
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Log summary
28
Noisy Neighbor
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Missing Log Data Dashboard
29
100%
100%
Every 1hour
Every 1minute
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 30
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Benefits of Platform Automation
• Automation has reduced SRE team platform install & update work
by 85%
• Precision has increased and human error has been removed
which has saved a lot of effort and time.
• Anyone can now easily work with the platform so we are not
dependent on individual “rockstars”
31
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Benefits of Focus on Observability
• Time to identify problems has been radically reduced
• Able to move from a Reactive to a Pro-active problem
resolution approach
• Contributed to a more sustainable, stress-free work
environment
32
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Outcomes
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Outcomes
Speed
75%
Time to Deploy
Weeks → 4hrs
Workspace
Provisioning
Scaling
Weeks → 5secs
Time to Scale
600k
TPS
Reliability
ZERO
Downtime
2hr → 2mins
VM Recovery
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Outcomes
Security
5x
Patch Frequency
1d → 4hr
Time to Patch
Productivity
11,000
AIs
3845
Apps in Production
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Future Plans
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
For improving the value of users
• Adding further clusters
x6 ⇒ x12
• Evolution of Log related architecture
relay ⇒ queuing
• All Platform Service Broker support
3PF ⇒ 1XXPF
• Proactive Operations
Able to safely take a nap on the job SRE J
37
> Stay Connected.
<Your CTA>
<Related Session>
#springone@s1p

Automation and Culture Changes for 40M Subscriber Platform Operation

  • 1.
    Automation and CultureChanges for 40M Subscriber Platform Operation Yuichiro Sano Yahoo! JAPAN ysano@yahoo-corp.jp
  • 2.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ About me Yuichiro Sano • Yahoo! JAPAN Platform Head of Cloud Platform Department • Responsible for operational support, promotion and management of on- premises platform (PCF k8s) at Yahoo! JAPAN 2
  • 3.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
  • 4.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ About Yahoo! JAPAN 4 3/03 67 A B HC CB C H H H BC C M CAD B B H CF ! F F J M F B B LD B B C F B B F B 2 1 ! A FHD CB ! H H B CH F J F J BCFAC HF 03 5 51 H 0 B E F ! CB H F C H . D B DCD H CB F H 6 CC . D B F + 6 0 7 B CDH A F H H F B HC B B J F H F H CF FJ H F BH CFA C B F H C F F 1 4 +B B F B ,C J F FJ
  • 5.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Why Cloud Foundry? • Needed to modernize systems, reduce operational cost • Productivity. Needed an environment where Engineers could just focus on building services and not worry about the rest • Bosh • Buildpack model • Auto-scale needed • Wanted to leverage existing OpenStack environment 5
  • 6.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ The Journey to Cloud Foundry 2015 Apr PoC start 2015 Oct Pilot Dev start 2016 Oct Pilot release 2017 Apr PCF GA (openStack) 2017 Oct openstack +1 cluster Today Openstack x2 vSphere x4 11,000 AI1,000 AI 3,000 AI 2018 Apr vSphere +2 cluster
  • 7.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Current Cluster Spec 7 Sandbox Development Production cluster 2 2 6 HV 40 120 360 Diego cell - 300 900 App Instance - 8,000 11,000 Total request/sec - - 600,000 Log traffic log/sec - - 90,000
  • 8.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Ecosystem around PCF 8 RDB MQ Splunk RBAC FaaS Repository KVS Object Storage Redis GitHub Enterprise
  • 9.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Organizational Structure 9 System dep. Platform dep. Cloud Platform dep. Infrastructure dep. other Platform dep. Network IaaSLBother Platform dep.other Platform dep.
  • 10.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Team Structure 10 Cloud Platform dep. PaaS CaaS
  • 11.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Introducing PCF to the Organization Most people were new to PCF, so we did the following: • Held company seminars, hands-on workshops for > 1000 developers • Maintained Japanese language reference material and tutorials • Provided best practice guidance for various development use-cases • Used Pivotal consulting services to provide support for platform and SRE • Created a Service Broker to handle special YJ cookie offload 11
  • 12.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Growing Pains • With the addition of clusters the operational burden and alert support increased • Along with the increase in #Apps, with insufficient clusters PCF was unable to accept new apps • With increasing log volume, our log management system became overloaded • We found App config mistakes (eg. timeouts) could affect the cluster (goRouter) • Time dealing with user support issues made it difficult to introduce stable operations policies 12
  • 13.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Clearly Defining the Role of Each Team 13 CRE SRE PCF users 2,500 Engineers PaaS team Propose efficient usage methods and proactively resolve issues to ease the transition for engineers Platform as a Product. Focus on increasing system reliability by preventing failure and promoting automation
  • 14.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ CRE team 14 Developer Counterpart l First line support for users l Contact users in case application mis-behaves Developers Education l Deliver workshops l Provide default CI/CD templates l Best practices l Applications Architecture support Service Integra8ons l Create service broker services l Support service team
  • 15.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SRE team 15 CRE counterpart l Implements CRE needs l Works closely with CRE for Capacity Planning Platform Updates l PCF Updates l Add new features around the platform l Logging, Metrics for users l Automate all the Things …. Platform Stability l Define SLO, SLA l Platform monitoring, alerts, etc... l Defining alerts, what, when, how ? l Capacity planning
  • 16.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SRE
  • 17.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Automate all the Things 17 Install PCF Backup with BBR Cluster Integrity Update PCF Prometheus deployment Quota, Usage check Buildpack Update Logs forwarder Deployment IaaS layer check (blobstore,...) Smoke Test User/Space/Orgs Management etc ….
  • 18.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ PCF Install & Update Pipeline 18 Deploy bosh upload Tile Install Tile 1 Day Deploy Opsman Create IaaS
  • 19.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Buildpack Update Pipeline 19 Dev Update staging Update Production Update Sandbox Update Every month Buildpack x 8
  • 20.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ PCF Backup 20 every-2am every-3am every-4am every-5am
  • 21.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ PCF Smoke-Test 21 All environments Every 10 minutes
  • 22.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Monitoring all the Things 22 App Instance Log Traffic Cell Capacity Avg response time cf push Duration Router Traffic CPU Usage Cluster Healthcheck Probe Mem Usage Log missing rate etc ….
  • 23.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cluster Dashboard 23 Routetr Rps Routetr Go Routines Routetr Latency
  • 24.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cell Capacity 24 Used Availlable Used Availlable
  • 25.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ App Latency 25 Latency 99% 90% 50%
  • 26.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Smoke Test 26 Cf Push Duration Cf Scale Duration Cf Start Duration Cf Delete Duration
  • 27.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ BOSH Use BOSH for Logging Components • Alert & App logs are transferred to the platform • Using BOSH makes it easier to scale the nozzle and relay components 27 Internal Notifications App A App B Splunk loggregator Monitor nozzle Splunk nozzle Monitor relay Splunk relay easier to scale
  • 28.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Log summary 28 Noisy Neighbor
  • 29.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Missing Log Data Dashboard 29 100% 100% Every 1hour Every 1minute
  • 30.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 30
  • 31.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Benefits of Platform Automation • Automation has reduced SRE team platform install & update work by 85% • Precision has increased and human error has been removed which has saved a lot of effort and time. • Anyone can now easily work with the platform so we are not dependent on individual “rockstars” 31
  • 32.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Benefits of Focus on Observability • Time to identify problems has been radically reduced • Able to move from a Reactive to a Pro-active problem resolution approach • Contributed to a more sustainable, stress-free work environment 32
  • 33.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Outcomes
  • 34.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Outcomes Speed 75% Time to Deploy Weeks → 4hrs Workspace Provisioning Scaling Weeks → 5secs Time to Scale 600k TPS Reliability ZERO Downtime 2hr → 2mins VM Recovery
  • 35.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Outcomes Security 5x Patch Frequency 1d → 4hr Time to Patch Productivity 11,000 AIs 3845 Apps in Production
  • 36.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Future Plans
  • 37.
    Unless otherwise indicated,these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ For improving the value of users • Adding further clusters x6 ⇒ x12 • Evolution of Log related architecture relay ⇒ queuing • All Platform Service Broker support 3PF ⇒ 1XXPF • Proactive Operations Able to safely take a nap on the job SRE J 37
  • 38.
    > Stay Connected. <YourCTA> <Related Session> #springone@s1p