SlideShare a Scribd company logo
FAIL... THE RIGHT 
WAY 
NODE.JS IN PRODUCTION 
ssw2014.formidablelabs.com 
 |  
@ryan_roemer formidablelabs.com
WELCOME TO PRODUCTION 
Production can be a rough place for 
your Node.js apps. Things can go very 
wrong out in the wild.
FORMIDABLE LABS
3:00 AM
OUR FOCUS 
Whether on PAAS, IAAS, or bare metal. 
Design for Failure: Keep your Node.js apps up 
Avoidance: Get yourself out of the failover business 
Isolate: One failure at a time 
Analyze: Debug and diagnose problems quickly
1. DESIGN FOR FAILURE 
Fail and recover at multiple levels. 
Let's look at failure from a system 
perspective.
SINGLE NODE.JS WORKER 
Never ignore errors. 
Have a strong bias for killing the 
worker. 
Handle: uncaughtException, 
Listen: foo.on("error") 
Domains
MULTIPLE NODE.JS WORKERS 
cluster recluster 
Use or to 
multiplex CPUs and isolate errors. 
Workers: die early on errors 
Master: monitor and kill workers
MULTIPLE NODE.JS WORKERS 
var recluster = require("recluster"); 
var cluster = recluster("./server.js"); 
cluster.run(); 
// Hot reload: kill -s SIGUSR2 CLUSTER_PID 
process.on("SIGUSR2", function() { 
console.log("Got SIGUSR2, reloading 
cluster..."); 
cluster.reload(); 
});
SERVER 
monit 
Use or alternatives 
Restart the Node.js master
SERVICE 
Load-balancers 
Heartbeat / ping monitors 
Availability zones, etc.
MAKE IT HOT 
Everything up to this point should have 
hot failover.
DATACENTER 
Hot failover across 
datacenters? 
Typically very costly 
But, the real deal if you're serious
DISASTER RECOVERY 
"Business Continuity" 
Don't let a technological problem end your business 
Have a worst case, "lose some data" recovery plan
2. AVOID FAILURES 
Get out of the business of failover 
when you don't have to do it yourself.
RESOURCES TO NOT SUPPORT 
Don't rely on system / service 
resources you don't need to. 
Disk: NAS, disks, SSDs. 
Datastores: DB, cloud services. 
... Load Balancers, DNS, etc.
HOW TO AVOID 
Use SAAS wherever possible! (DB, LBs, storage). 
Or PAAS for some Node.js apps. 
Design Stateless, fungible servers (no disk risks).
3. ISOLATE FAILURES 
Isolate failures you can't 
avoid.
RESOURCES TO SUPPORT 
Look to resources you must depend on: 
CPU/Load: Run out of this and it's over. 
HTTP: Each different host you hit. 
Datastores: Connections? Different Hosts? 
... also, memory, I/O, etc. and combinations thereof
SOME ANECDOTES 
Node.js apps can be bad neighbors. 
DB (auto-suggest) vs. HTTP (vendor translations) 
DB (CRUD app) vs. CPU/Load (co-located PHP app) 
Read vs. Write DB operations.
HOW TO ISOLATE 
Create "micro-services" that stand on their own. 
Monitor for cross-pressure and respond. (Next section!)
4. ANALYZE EVERYTHING 
Data drives problem discovery 
and action.
LOG, MONITOR, MINE
DECISIONS, GOALS 
Things to look for in Node.js apps... 
Identify 
Resource pressure: CPU, I/O, 
memory, network 
Performance: Throughput, 
latency 
Errors/Bugs: Quantitative, 
qualitative 
Decide 
Scale up, scale down? 
Separate services?
RECAP 
Design for failure 
Avoid 
Isolate 
Analyze
THANKS! 
ssw2014.formidablelabs.com 
 |  
@ryan_roemer formidablelabs.com

More Related Content

Similar to Fail the Right Way - Node.js in Production

Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
royans
 
Cloud Best Practices
Cloud Best PracticesCloud Best Practices
Cloud Best PracticesEric Bottard
 
Performance and Scalability
Performance and ScalabilityPerformance and Scalability
Performance and Scalability
Mediacurrent
 
7 Redux challenges
7 Redux challenges7 Redux challenges
7 Redux challenges
reactima
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCCal Henderson
 
MYSQL Patterns in Amazon - Make the Cloud Work For You
MYSQL Patterns in Amazon - Make the Cloud Work For YouMYSQL Patterns in Amazon - Make the Cloud Work For You
MYSQL Patterns in Amazon - Make the Cloud Work For You
Pythian
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Spark Summit
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Jon Meredith
 
Embracing Database Diversity: The New Oracle / MySQL DBA - UKOUG
Embracing Database Diversity: The New Oracle / MySQL DBA -   UKOUGEmbracing Database Diversity: The New Oracle / MySQL DBA -   UKOUG
Embracing Database Diversity: The New Oracle / MySQL DBA - UKOUG
Keith Hollman
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
Stavros Kontopoulos
 
OWASP Serbia - A6 security misconfiguration
OWASP Serbia - A6 security misconfigurationOWASP Serbia - A6 security misconfiguration
OWASP Serbia - A6 security misconfigurationNikola Milosevic
 
OVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud DatabasesOVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud Databases
OVHcloud
 
WebWorkersCamp 2010
WebWorkersCamp 2010WebWorkersCamp 2010
WebWorkersCamp 2010
Olivier Gutknecht
 
Discover System Facilities inside Your Android Phone
Discover System Facilities inside Your Android Phone Discover System Facilities inside Your Android Phone
Discover System Facilities inside Your Android Phone
National Cheng Kung University
 
Scaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosScaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with Mesos
Rob Gulewich
 
VMworld US 2011 - Avoiding the 16 Biggest HA & DRS Configuration Mistakes
VMworld US 2011 - Avoiding the 16 Biggest HA & DRS Configuration MistakesVMworld US 2011 - Avoiding the 16 Biggest HA & DRS Configuration Mistakes
VMworld US 2011 - Avoiding the 16 Biggest HA & DRS Configuration Mistakes
Concentrated Technology
 
Cloud adoption fails - 5 ways deployments go wrong and 5 solutions
Cloud adoption fails - 5 ways deployments go wrong and 5 solutionsCloud adoption fails - 5 ways deployments go wrong and 5 solutions
Cloud adoption fails - 5 ways deployments go wrong and 5 solutions
Yevgeniy Brikman
 
MySQL HA Alternatives 2010
MySQL  HA  Alternatives 2010MySQL  HA  Alternatives 2010
MySQL HA Alternatives 2010
Kris Buytaert
 

Similar to Fail the Right Way - Node.js in Production (20)

Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Cloud Best Practices
Cloud Best PracticesCloud Best Practices
Cloud Best Practices
 
Performance and Scalability
Performance and ScalabilityPerformance and Scalability
Performance and Scalability
 
7 Redux challenges
7 Redux challenges7 Redux challenges
7 Redux challenges
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
MYSQL Patterns in Amazon - Make the Cloud Work For You
MYSQL Patterns in Amazon - Make the Cloud Work For YouMYSQL Patterns in Amazon - Make the Cloud Work For You
MYSQL Patterns in Amazon - Make the Cloud Work For You
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Embracing Database Diversity: The New Oracle / MySQL DBA - UKOUG
Embracing Database Diversity: The New Oracle / MySQL DBA -   UKOUGEmbracing Database Diversity: The New Oracle / MySQL DBA -   UKOUG
Embracing Database Diversity: The New Oracle / MySQL DBA - UKOUG
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
OWASP Serbia - A6 security misconfiguration
OWASP Serbia - A6 security misconfigurationOWASP Serbia - A6 security misconfiguration
OWASP Serbia - A6 security misconfiguration
 
OVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud DatabasesOVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud Databases
 
WebWorkersCamp 2010
WebWorkersCamp 2010WebWorkersCamp 2010
WebWorkersCamp 2010
 
Discover System Facilities inside Your Android Phone
Discover System Facilities inside Your Android Phone Discover System Facilities inside Your Android Phone
Discover System Facilities inside Your Android Phone
 
Scaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosScaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with Mesos
 
Speed up sql
Speed up sqlSpeed up sql
Speed up sql
 
VMworld US 2011 - Avoiding the 16 Biggest HA & DRS Configuration Mistakes
VMworld US 2011 - Avoiding the 16 Biggest HA & DRS Configuration MistakesVMworld US 2011 - Avoiding the 16 Biggest HA & DRS Configuration Mistakes
VMworld US 2011 - Avoiding the 16 Biggest HA & DRS Configuration Mistakes
 
Cloud adoption fails - 5 ways deployments go wrong and 5 solutions
Cloud adoption fails - 5 ways deployments go wrong and 5 solutionsCloud adoption fails - 5 ways deployments go wrong and 5 solutions
Cloud adoption fails - 5 ways deployments go wrong and 5 solutions
 
MySQL HA Alternatives 2010
MySQL  HA  Alternatives 2010MySQL  HA  Alternatives 2010
MySQL HA Alternatives 2010
 

Recently uploaded

GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 

Recently uploaded (20)

GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 

Fail the Right Way - Node.js in Production

  • 1. FAIL... THE RIGHT WAY NODE.JS IN PRODUCTION ssw2014.formidablelabs.com  |  @ryan_roemer formidablelabs.com
  • 2. WELCOME TO PRODUCTION Production can be a rough place for your Node.js apps. Things can go very wrong out in the wild.
  • 5. OUR FOCUS Whether on PAAS, IAAS, or bare metal. Design for Failure: Keep your Node.js apps up Avoidance: Get yourself out of the failover business Isolate: One failure at a time Analyze: Debug and diagnose problems quickly
  • 6. 1. DESIGN FOR FAILURE Fail and recover at multiple levels. Let's look at failure from a system perspective.
  • 7. SINGLE NODE.JS WORKER Never ignore errors. Have a strong bias for killing the worker. Handle: uncaughtException, Listen: foo.on("error") Domains
  • 8. MULTIPLE NODE.JS WORKERS cluster recluster Use or to multiplex CPUs and isolate errors. Workers: die early on errors Master: monitor and kill workers
  • 9. MULTIPLE NODE.JS WORKERS var recluster = require("recluster"); var cluster = recluster("./server.js"); cluster.run(); // Hot reload: kill -s SIGUSR2 CLUSTER_PID process.on("SIGUSR2", function() { console.log("Got SIGUSR2, reloading cluster..."); cluster.reload(); });
  • 10. SERVER monit Use or alternatives Restart the Node.js master
  • 11. SERVICE Load-balancers Heartbeat / ping monitors Availability zones, etc.
  • 12. MAKE IT HOT Everything up to this point should have hot failover.
  • 13. DATACENTER Hot failover across datacenters? Typically very costly But, the real deal if you're serious
  • 14. DISASTER RECOVERY "Business Continuity" Don't let a technological problem end your business Have a worst case, "lose some data" recovery plan
  • 15. 2. AVOID FAILURES Get out of the business of failover when you don't have to do it yourself.
  • 16. RESOURCES TO NOT SUPPORT Don't rely on system / service resources you don't need to. Disk: NAS, disks, SSDs. Datastores: DB, cloud services. ... Load Balancers, DNS, etc.
  • 17. HOW TO AVOID Use SAAS wherever possible! (DB, LBs, storage). Or PAAS for some Node.js apps. Design Stateless, fungible servers (no disk risks).
  • 18. 3. ISOLATE FAILURES Isolate failures you can't avoid.
  • 19. RESOURCES TO SUPPORT Look to resources you must depend on: CPU/Load: Run out of this and it's over. HTTP: Each different host you hit. Datastores: Connections? Different Hosts? ... also, memory, I/O, etc. and combinations thereof
  • 20. SOME ANECDOTES Node.js apps can be bad neighbors. DB (auto-suggest) vs. HTTP (vendor translations) DB (CRUD app) vs. CPU/Load (co-located PHP app) Read vs. Write DB operations.
  • 21. HOW TO ISOLATE Create "micro-services" that stand on their own. Monitor for cross-pressure and respond. (Next section!)
  • 22. 4. ANALYZE EVERYTHING Data drives problem discovery and action.
  • 24. DECISIONS, GOALS Things to look for in Node.js apps... Identify Resource pressure: CPU, I/O, memory, network Performance: Throughput, latency Errors/Bugs: Quantitative, qualitative Decide Scale up, scale down? Separate services?
  • 25. RECAP Design for failure Avoid Isolate Analyze
  • 26. THANKS! ssw2014.formidablelabs.com  |  @ryan_roemer formidablelabs.com