SlideShare a Scribd company logo
1 of 29
How we survived
Hurricane Sandy
a look at what not to do

Dan White
@airplanedan
Mike Zaic
VP, Engineering, CafeMom
@focustrate

Photo: TenSafeFrogs (flickr)
Introduction
Calm before the Storm
Into the Storm
Lessons Learned
CafeMom
•
•
•
•
•

Founded: 2006
Social Network for moms
> 3 million members
> 10 million monthly UVs
~ 130 employees

– Tech: 13 people
– Sales/Community: bulk of
people
Introduction
Calm before the Storm
Into the Storm
Lessons Learned
Our Application(s)
stats
photos

MamasLatinas
video

answers

profiles

sponsorships

advice

CafeMom
advertising

admin tools
Que Mas

email

The Stir

language switching

games

groups
DevOps Focus and Pace
•
•
•
•

Sales Driven!
Limited resources
Product shifting
Priorities – refactoring?
Photo: árticotropical (flickr)
Architecture
• Hosting
– Single physical datacenter
– Redundancy? pffff

• Cloud Presence
– Limited to specific parts of the app
– No database required
Photo: Arthur40A (flickr)
Infrastructure Growth

minimal hardware

2006

rapid growth
=
infrastructure expands

continued growth
=
optimize for scale

stagnant infrastructure + more functionality
=
tightly coupled birds nest
slower growth
=
stagnant infrastructure

2013
“If NYC has serious issues we’ll all have more important things to worry about…”
Introduction
Calm before the Storm
Into the Storm
Lessons Learned
Day 1: Bring on Sandy!
Day 2: waiting and hoping
Day 3: false hope

Photo: raneko (flickr)
Day 4: code with no data

Photo: mandiberg (flickr)
Day 5: generator anticipation
Day 6: generator #fail
Day 7: everything… works?
Introduction
Calm before the Storm
Into the Storm
Lessons Learned
Technical Takeaways
what we did

•
•
•
•

Redundancies
Code without data
DNS propagation
Cloud replication
– Codebase
– Database

Photo: Robert S. Donovan (flickr)
Technical Takeaways
next steps

• Architecting for degraded
service?
• Simplify application?
• Second physical site?
• Geographic redundancy?
• Undoing "fixes" of failovers?

Photo: paul bica (flickr)
Overarching Problems
•
•
•
•
•
•
•

Don’t trust vendor/provider assurances!
Managing expectations during an outage
Feelings of helplessness
Scrambling doesn't necessarily get things up faster
Must know downtime tolerance
Cost of DR planning vs Lost opportunity of outage
DR Planning is insurance
Know when to say when
•
•
•
•

When is it appropriate to plan for DR?
When is it appropriate to build for DR?
When is it appropriate to retrofit DR for existing app?
When is it appropriate to dictate product requirements
based on ease of DR?
• During an outage - when is it appropriate to do nothing
(and just sleep)?
When not a great time to think about DR?
During a disaster.
(unless you’re thinking about the next one)
Moving forward – inertia sucks!
• Change is tough
• Mature infrastructure/app resists drastic change
• Pace of development inertia against stepping back
and revisiting
• Pace of business inertia against non-revenue
projects
• No recent disasters inertia against DR necessity
baby steps
thank you

Photo: Lisa Bettany {Mostly Lisa} (flickr)
thank you

Photo: Lisa Bettany {Mostly Lisa} (flickr)
thank you

Photo: Lisa Bettany {Mostly Lisa} (flickr)

More Related Content

Viewers also liked

Line Bias: exploiting gambling line data
Line Bias: exploiting gambling line dataLine Bias: exploiting gambling line data
Line Bias: exploiting gambling line dataMichael Zaic
 
Honda VFR400 nc30 service manual
Honda VFR400 nc30  service manualHonda VFR400 nc30  service manual
Honda VFR400 nc30 service manualandonis-artist
 
Driving in-thailand.com test-eng2
Driving in-thailand.com test-eng2Driving in-thailand.com test-eng2
Driving in-thailand.com test-eng2andonis-artist
 
2007 owner manual honda cbr600rr
2007 owner manual honda cbr600rr2007 owner manual honda cbr600rr
2007 owner manual honda cbr600rrandonis-artist
 
Vintage calf skin jacket restoration
Vintage calf skin jacket restorationVintage calf skin jacket restoration
Vintage calf skin jacket restorationandonis-artist
 
Maintaining apartment 601, Sea and Sky, Phuket, Thailand
Maintaining apartment 601, Sea and Sky, Phuket, ThailandMaintaining apartment 601, Sea and Sky, Phuket, Thailand
Maintaining apartment 601, Sea and Sky, Phuket, Thailandandonis-artist
 

Viewers also liked (9)

headache
headacheheadache
headache
 
iv administiration
iv administiration iv administiration
iv administiration
 
Line Bias: exploiting gambling line data
Line Bias: exploiting gambling line dataLine Bias: exploiting gambling line data
Line Bias: exploiting gambling line data
 
Biodegradable polymers as drug carriers
Biodegradable polymers as drug carriers Biodegradable polymers as drug carriers
Biodegradable polymers as drug carriers
 
Honda VFR400 nc30 service manual
Honda VFR400 nc30  service manualHonda VFR400 nc30  service manual
Honda VFR400 nc30 service manual
 
Driving in-thailand.com test-eng2
Driving in-thailand.com test-eng2Driving in-thailand.com test-eng2
Driving in-thailand.com test-eng2
 
2007 owner manual honda cbr600rr
2007 owner manual honda cbr600rr2007 owner manual honda cbr600rr
2007 owner manual honda cbr600rr
 
Vintage calf skin jacket restoration
Vintage calf skin jacket restorationVintage calf skin jacket restoration
Vintage calf skin jacket restoration
 
Maintaining apartment 601, Sea and Sky, Phuket, Thailand
Maintaining apartment 601, Sea and Sky, Phuket, ThailandMaintaining apartment 601, Sea and Sky, Phuket, Thailand
Maintaining apartment 601, Sea and Sky, Phuket, Thailand
 

Similar to How we survived Hurricane Sandy

The dev ops drumbeat reinventing the iron triangle
The dev ops drumbeat reinventing the iron triangleThe dev ops drumbeat reinventing the iron triangle
The dev ops drumbeat reinventing the iron triangleJason Bloomberg
 
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"Daniel Bryant
 
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve PooleDevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve PooleJAXLondon_Conference
 
Data Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningData Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningKyle Hailey
 
Why Enterprise Digital Strategies Must Drive IT Modernization
Why Enterprise Digital Strategies Must Drive IT ModernizationWhy Enterprise Digital Strategies Must Drive IT Modernization
Why Enterprise Digital Strategies Must Drive IT ModernizationJason Bloomberg
 
Webinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually HappenWebinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually HappenStorage Switzerland
 
Choosing Public vs. Private vs. Hybrid Cloud Computing
Choosing Public vs. Private vs. Hybrid Cloud ComputingChoosing Public vs. Private vs. Hybrid Cloud Computing
Choosing Public vs. Private vs. Hybrid Cloud ComputingSkytap Cloud
 
ALM Practices - Modern Applications Development and its impact on ALM
ALM Practices - Modern Applications Development and its impact on ALM ALM Practices - Modern Applications Development and its impact on ALM
ALM Practices - Modern Applications Development and its impact on ALM especificacoes.com
 
Embracing collaborative chaos (April 2020) by Lyndsay Prewer
Embracing collaborative chaos (April 2020) by Lyndsay PrewerEmbracing collaborative chaos (April 2020) by Lyndsay Prewer
Embracing collaborative chaos (April 2020) by Lyndsay PrewerEqual Experts
 
Executive Presentation on Agile Project Management by Boardroom Metrics Inc.
Executive Presentation on Agile Project Management by Boardroom Metrics Inc.Executive Presentation on Agile Project Management by Boardroom Metrics Inc.
Executive Presentation on Agile Project Management by Boardroom Metrics Inc.Boardroom Metrics
 
Addressing the DevOps Resilience Challenge
Addressing the DevOps Resilience ChallengeAddressing the DevOps Resilience Challenge
Addressing the DevOps Resilience ChallengeJason Bloomberg
 
Adapting agile afei - 2-15
Adapting agile   afei - 2-15Adapting agile   afei - 2-15
Adapting agile afei - 2-15Jason Bloomberg
 
Embracing collaborative chaos
Embracing collaborative chaosEmbracing collaborative chaos
Embracing collaborative chaosEqual Experts
 
ATC2013-Thiru and Abhishek-How to prevent Agile from becoming Fragile?
ATC2013-Thiru and Abhishek-How to prevent Agile from becoming Fragile?ATC2013-Thiru and Abhishek-How to prevent Agile from becoming Fragile?
ATC2013-Thiru and Abhishek-How to prevent Agile from becoming Fragile?India Scrum Enthusiasts Community
 
Breaking Down Enterprise Silos in the Cloud - Jason Bloomberg, Intellyx, Clou...
Breaking Down Enterprise Silos in the Cloud - Jason Bloomberg, Intellyx, Clou...Breaking Down Enterprise Silos in the Cloud - Jason Bloomberg, Intellyx, Clou...
Breaking Down Enterprise Silos in the Cloud - Jason Bloomberg, Intellyx, Clou...Jason Bloomberg
 
How To Leverage Cloud Computing for Business & Operational Benefit - CAMP IT
How To Leverage Cloud Computing for Business & Operational Benefit - CAMP ITHow To Leverage Cloud Computing for Business & Operational Benefit - CAMP IT
How To Leverage Cloud Computing for Business & Operational Benefit - CAMP ITSkytap Cloud
 
Bringing Your Web Apps to IBM Digital Experience
Bringing Your Web Apps to IBM Digital ExperienceBringing Your Web Apps to IBM Digital Experience
Bringing Your Web Apps to IBM Digital ExperienceJohn Head
 
Does Agile Enterprise Architecture = Agile + Enterprise Architecture?
Does Agile Enterprise Architecture = Agile + Enterprise Architecture?Does Agile Enterprise Architecture = Agile + Enterprise Architecture?
Does Agile Enterprise Architecture = Agile + Enterprise Architecture?Jason Bloomberg
 
Seeking Sunshine in Cloud Technology - STC PMC 2014
Seeking Sunshine in Cloud Technology - STC PMC 2014Seeking Sunshine in Cloud Technology - STC PMC 2014
Seeking Sunshine in Cloud Technology - STC PMC 2014Roger Renteria
 
Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...
Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...
Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...AppDynamics
 

Similar to How we survived Hurricane Sandy (20)

The dev ops drumbeat reinventing the iron triangle
The dev ops drumbeat reinventing the iron triangleThe dev ops drumbeat reinventing the iron triangle
The dev ops drumbeat reinventing the iron triangle
 
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
 
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve PooleDevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
 
Data Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningData Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloning
 
Why Enterprise Digital Strategies Must Drive IT Modernization
Why Enterprise Digital Strategies Must Drive IT ModernizationWhy Enterprise Digital Strategies Must Drive IT Modernization
Why Enterprise Digital Strategies Must Drive IT Modernization
 
Webinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually HappenWebinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually Happen
 
Choosing Public vs. Private vs. Hybrid Cloud Computing
Choosing Public vs. Private vs. Hybrid Cloud ComputingChoosing Public vs. Private vs. Hybrid Cloud Computing
Choosing Public vs. Private vs. Hybrid Cloud Computing
 
ALM Practices - Modern Applications Development and its impact on ALM
ALM Practices - Modern Applications Development and its impact on ALM ALM Practices - Modern Applications Development and its impact on ALM
ALM Practices - Modern Applications Development and its impact on ALM
 
Embracing collaborative chaos (April 2020) by Lyndsay Prewer
Embracing collaborative chaos (April 2020) by Lyndsay PrewerEmbracing collaborative chaos (April 2020) by Lyndsay Prewer
Embracing collaborative chaos (April 2020) by Lyndsay Prewer
 
Executive Presentation on Agile Project Management by Boardroom Metrics Inc.
Executive Presentation on Agile Project Management by Boardroom Metrics Inc.Executive Presentation on Agile Project Management by Boardroom Metrics Inc.
Executive Presentation on Agile Project Management by Boardroom Metrics Inc.
 
Addressing the DevOps Resilience Challenge
Addressing the DevOps Resilience ChallengeAddressing the DevOps Resilience Challenge
Addressing the DevOps Resilience Challenge
 
Adapting agile afei - 2-15
Adapting agile   afei - 2-15Adapting agile   afei - 2-15
Adapting agile afei - 2-15
 
Embracing collaborative chaos
Embracing collaborative chaosEmbracing collaborative chaos
Embracing collaborative chaos
 
ATC2013-Thiru and Abhishek-How to prevent Agile from becoming Fragile?
ATC2013-Thiru and Abhishek-How to prevent Agile from becoming Fragile?ATC2013-Thiru and Abhishek-How to prevent Agile from becoming Fragile?
ATC2013-Thiru and Abhishek-How to prevent Agile from becoming Fragile?
 
Breaking Down Enterprise Silos in the Cloud - Jason Bloomberg, Intellyx, Clou...
Breaking Down Enterprise Silos in the Cloud - Jason Bloomberg, Intellyx, Clou...Breaking Down Enterprise Silos in the Cloud - Jason Bloomberg, Intellyx, Clou...
Breaking Down Enterprise Silos in the Cloud - Jason Bloomberg, Intellyx, Clou...
 
How To Leverage Cloud Computing for Business & Operational Benefit - CAMP IT
How To Leverage Cloud Computing for Business & Operational Benefit - CAMP ITHow To Leverage Cloud Computing for Business & Operational Benefit - CAMP IT
How To Leverage Cloud Computing for Business & Operational Benefit - CAMP IT
 
Bringing Your Web Apps to IBM Digital Experience
Bringing Your Web Apps to IBM Digital ExperienceBringing Your Web Apps to IBM Digital Experience
Bringing Your Web Apps to IBM Digital Experience
 
Does Agile Enterprise Architecture = Agile + Enterprise Architecture?
Does Agile Enterprise Architecture = Agile + Enterprise Architecture?Does Agile Enterprise Architecture = Agile + Enterprise Architecture?
Does Agile Enterprise Architecture = Agile + Enterprise Architecture?
 
Seeking Sunshine in Cloud Technology - STC PMC 2014
Seeking Sunshine in Cloud Technology - STC PMC 2014Seeking Sunshine in Cloud Technology - STC PMC 2014
Seeking Sunshine in Cloud Technology - STC PMC 2014
 
Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...
Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...
Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

How we survived Hurricane Sandy

Editor's Notes

  1. This conference is in NYC – assuming people remember “Hurricane Sandy” – end of October, last year? This is what Manhattan looked like. We work for CafeMom.com – Dan was the Senior Vice President of Technology at the time, Mike ran the development team. Our datacenter was in lower Manhattan, and our site went down for about a week. We’re hoping that you can learn from some of our mistakes, and use us as a case study to bring to your organizations to emphasize the need for business continuity and disaster recovery. Let’s get started.
  2. Other site mentions - MamasLatinas – launched Jan 2012 - The Stir / Que Mas User base - Rabid - Expectation of Realtime - Very Vocal - Not shy about complaining – CHANGE IS BAD!!!
  3. breadth of application pieces are both loosely and tightly coupled - lots of unrelated functionality - reliance on common data (friends activity!) - interconnected code complex logic
  4. Hosting: “Who’s got multiple datacenters?” Cloud presence: * EC2, S3, Cloudfront * Basically for offloading bits off the app Resiliency: - built into the application - can handle outages at - application - server - network device - power circuit - internet pipe
  5. Sentence sums up prior mindset with regards to a single datacenter in NYC.
  6. Almost 1 year ago – end of October, into November Daily account Frame in terms of: 1. Communication 2. Technical / Physical
  7. Communication 10/29 9:07am - DG email: we're aware of the storm and all is good Physical locations of Dan/Mike 5:46pm - DG says ready for the storm hour later - site's down communicated to DG communicated to company 9:25pm - everything's back! communicated to company... ... prematurely 9:43pm - out for good let the company know that night Why? Fuel pumps in flooded basement! Re-tell 9/11 stuff Technical Static EC2 fail page Switch nameservers caused problems ssh access to production
  8. Communication From DG re: pumping water Army corps of engineers help Seemingly misleading/contradictory reports Set up script to monitor DG page Us to company throughout with updates Communication with Gawker, eyes on ground - stuff is happening! Technical Inbound email - cloud hosted email, but still routed through DG mailserver means no inbound external email... need to repoint postini directly to 365 Even after switch, took a while to unspool We did get it working thestir wordpress -- dns thestirlive.cafemom.com
  9. Communication Frequent DG updates - Fiber optic cables severed? - Some of our datacenters are up - we're not so bad - Water still being pumped - Mobile generator on the way Lots of false hope: - Only 2 feet of water left to pump - Fiber connectivity good - ETA on generator = 5pm tomorrow Finally reached out to the team to make sure everyone was ok? San Diego contingent was ok Get a statement from CEO Continuous stream of idiotic emails BI emails Sales tickets Reporting questions "My mouse is broken" Technical / Physical around midnight Barry started driving into Manhattan for a snatch and grab
  10. Technical Barry got the server - 7 floors, pitch black stairwell up AND down - brought it home - racked it on his coffee table - started to copy code base / config to EC2 Start re-architecting app to work in the cloud Communication All generator based - stuck in traffic ETA of 5pm reiterated up until 3pm 8:05pm find out that it's 100 miles away in stop and go traffic - but almost here! BUT ... thank god the generator was delayed bc the cables didn't get there until after 10:30 Note: from the time that we started getting updates about the generator to the generator actually getting there felt like an ETERNITY. AN ETERNITY.
  11. Communication All generator updates Generator in traffic, 4 hours to go 13 blocks 12:45pm - another generator on it's way 2pm - good thing we have another entering the city - first one is unsafe, so we can't use it anyway 2:07pm - it arrived took 7 hours for final connection Technical EC2 - servers set up, configs set up - reverse engineered schemas, still no data
  12. Communication 11/03 1:31am - DG site update: GENERATOR IS RUNNING! 11/03 8:00am - DG site update: EMERGENCY GENERATOR SHUTDOWN! 11/03 10:31am - DG site update: GENERATOR IS RUNNING AGAIN! 11/03 5:50pm - DG site update: EMERGENCY GENERATOR SHUTDOWN AGAIN! 11/03 11:20pm - DG site update: generator techs on site working to repair IT: just go get servers Technical / Physical Sent 2 IT guys to be onsite, hands on Starting testing feasibility of using RDS => EC2 better
  13. Technical/Physical Cesar/Eric go up 25 flights, grab 3 servers (all dedicated backups slaves so site integrity not compromised if power is restored) Set up shop at Barry's house Get to work on bringing DBs to EC2 took hours Messages table - huge row chunks for bigger tables Communication Generator not just broken, but irreparably broken. Needs to be replaced New generator onsite within an hour and half Up and running at 10:16am Site came back mid-afternoon Issues once it gets up? - DB corruption, fixing slaved - Priming - Switching DNS back over lower TTL (300s) - Fresh cache - Some slaves missing (on Barry’s kitchen counter) 11/04 6:31pm - @cafemom officially tweets that the site is back up. They have a contest with prizes to try and get people coming back. For purposes of this talk, we’ll cut it off here. Lingering issues for a few days.
  14. This is what our traffic looked like, per GA. So what did we learn? We’ll look at: - the tech end of things - the business end of things - inertia
  15. In Hindsight Redundancies servers removed from architecture werent critical... no impact to site infrastructure integrity when power restored some servers suffered damage, but redundancies mean we're ok Code without data code without data doesnt help for most of cafemom/mamaslatinas (stir/quemas functionality could have been replicated, but no post history) DNS propagation strategy to avoid dns propagation would have been nice (reverse proxy?) DG -> GoDaddy -> Route 53 Watch your TTLs! Cloud replication - Codebase did DB and full request load testing after we had data - Database set up ec2 based slave for realtime offsite db had to determine if RDS could give required IO performance (dedicated high IOPS could) --- BUT.. RDS couldn’t be slaved?
  16. Next Steps investigated ASAP hosting (eg rackspace) – would have taken too long once service restored shifted investigation to alternate primary datacenter and potential secondary physical DR hosting any potential to simplify application for easier DR planning? (maybe less realtime, read only version for nonmembers and emergencies?) - HOW do you pick? Who decides? How do you back out of the fail over if you have to use it?
  17. all scrambling during outage didn't get us up faster, just got us a ec2 hosted fail page and wordpress sites (after dns nameserver changes finally propagated) scrambling could have caused more issues if there was costly "undoing" necessary (bring servers back to DG, repoint DNS, undo workarounds, etc) managing expectations and keeping people informed of (lack of) progress, especially when you're operating in the dark and relaying poor information POOR information in general. Gawker was on site, but the details weren’t especially helpful other than just knowing that people were, in fact, doing things
  18. After retrofit question: Building Dr into new application is easy(er) vs existing application More complex the application usually means more complex the Dr solution know when to say when (no sleep for no reason?)- should we have exhausted ourselves, potentially opening us up to big mistakes when we knew we couldnt do anything to help current situation
  19. Last day: 2 concurrent paths Dan and I working on different problems. Dan: working on getting the site up Me: working on having a fallback
  20. Here we are, a year later – and not much has changed other than that which followed in the weeks after Sandy. We finally put in a ticket for a fully fleshed out DR site on March 4 of this year – as you can see above. It’s normal priority, with no due date, and has sat there bc it’s never been the same priority as it was following the storm. This screen shot is from this past weekend. We are making progress, though. We have a fallback solution, and we’re meeting this week to talk specifically about DR. We’re building out our engineering some, so we might have the resources to allocate to it. We’re taking baby steps.
  21. Conclude with: We’re hoping that you can learn from some of our mistakes, and use us as a case study to bring to your organizations to emphasize the need for business continuity and disaster recovery. Thanks for listening. We’d like to open the floor up for questions.
  22. Thanks! We’d like to open the floor up for questions.
  23. Thanks! We’d like to open the floor up for questions.