Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Alert workflow in
Gaming DevOps
Eduardo Saito
Director of Engineering - Server Operations
GREE International
November 2013
Traditional Alert workflow

Ops
NOC

SME (Network, DBA,…)
Dev
Traditional Alert workflow

Ops
NOC

SME (Network, DBA,…)
Dev
Alert workflow – previous

Critical
Alert workflow – previous

Ops
Critical

Dev
Alert workflow – previous

Ops: where’s the runbook for this?
Ops: app bug or system issue?
Ops: who’s the devel of this g...
Alert workflow 2.0

Ops: where’s the runbook for this?
Ops: app bug or system issue?
Ops: who’s the devel of this game?
Ph...
Alert Workflow 3.0 - current
Ops
Dev, Project X, Server
Alert Workflow 3.0 - current
Ops
Dev, Project X, Server

Each alert go directly to
the right team that can
resolve it !

D...
Alerts go to the person that can resolve
Type

Scope

Checked by

Who to page?

ELB

Load balancer
health-check

ELB

No o...
Alerts go to the person that can resolve
Type

Scope

Checked by

Who to page?

ELB

Load balancer
health-check

ELB

No o...
Alerts go to the person that can resolve
Type

Scope

Checked by

Who to page?

ELB

Load balancer
health-check

ELB

No o...
Alerts go to the person that can resolve
Type

App-level alerts can beChecked byby issuesto page?
triggered
Scope
Who in:
...
Dev and Ops are responsible
Team

On-call

Ops

8

Dev

32, from 20 games (Serverside or client-side Android or
iOS)

Anal...
Big display dashboard = quick status
Big display dashboard = quick status
IM Bot = better communication
Skype Bot
informs in the
game
channel that
an alert was
triggered
IM Bot = better communication

Ops and Dev
receive the
alert, and
troubleshoot
IM Bot = better communication

Skype Bot
detects issue
is resolved
and send allclear
Thank You!
eduardo.saito@gree.net

We’re hiring!
Vancouver and San Francisco
http://gree-corp.com/jobs
Upcoming SlideShare
Loading in …5
×

Gaming dev ops - Eduardo Saito

767 views

Published on

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this

Gaming dev ops - Eduardo Saito

  1. 1. Alert workflow in Gaming DevOps Eduardo Saito Director of Engineering - Server Operations GREE International November 2013
  2. 2. Traditional Alert workflow Ops NOC SME (Network, DBA,…) Dev
  3. 3. Traditional Alert workflow Ops NOC SME (Network, DBA,…) Dev
  4. 4. Alert workflow – previous Critical
  5. 5. Alert workflow – previous Ops Critical Dev
  6. 6. Alert workflow – previous Ops: where’s the runbook for this? Ops: app bug or system issue? Ops: who’s the devel of this game? Phone #? Ops: I can’t find the developer… who’s his manager? Critical Critical NonCritical Ops Dev
  7. 7. Alert workflow 2.0 Ops: where’s the runbook for this? Ops: app bug or system issue? Ops: who’s the devel of this game? Phone #? Ops: I can’t find the developer… who’s his manager? Ops Critical Dev
  8. 8. Alert Workflow 3.0 - current Ops Dev, Project X, Server
  9. 9. Alert Workflow 3.0 - current Ops Dev, Project X, Server Each alert go directly to the right team that can resolve it ! Dev, Project Y, Client, Android Dev, …
  10. 10. Alerts go to the person that can resolve Type Scope Checked by Who to page? ELB Load balancer health-check ELB No one – email alert only System-level Check cpu / disk / memory / network Pingdom / Nagios Ops team App-level Application issues / bugs Pingdom Dev and Ops teams
  11. 11. Alerts go to the person that can resolve Type Scope Checked by Who to page? ELB Load balancer health-check ELB No one – email alert only System-level Check cpu / disk / memory / network Pingdom / Nagios Ops team App-level Application issues / bugs Pingdom Dev and Ops teams
  12. 12. Alerts go to the person that can resolve Type Scope Checked by Who to page? ELB Load balancer health-check ELB No one – email alert only System-level Check cpu / disk / memory / network Pingdom / Nagios Ops team App-level Application issues / bugs Pingdom Dev and Ops teams
  13. 13. Alerts go to the person that can resolve Type App-level alerts can beChecked byby issuesto page? triggered Scope Who in: ELB System-level • Load balancer ELB Server-side • health-check Client-side •  iOS Check cpu / •  Android Pingdom / disk / memory / network App-level Pingdom Ops team Nagios Application issues / bugs No one – email alert only Dev and Ops teams
  14. 14. Dev and Ops are responsible Team On-call Ops 8 Dev 32, from 20 games (Serverside or client-side Android or iOS) Analytics 5
  15. 15. Big display dashboard = quick status
  16. 16. Big display dashboard = quick status
  17. 17. IM Bot = better communication Skype Bot informs in the game channel that an alert was triggered
  18. 18. IM Bot = better communication Ops and Dev receive the alert, and troubleshoot
  19. 19. IM Bot = better communication Skype Bot detects issue is resolved and send allclear
  20. 20. Thank You! eduardo.saito@gree.net We’re hiring! Vancouver and San Francisco http://gree-corp.com/jobs

×