SlideShare a Scribd company logo
Design Review Best Practices 
Velocity Barcelona 2014 
Mandi Walls 
Consulting Director, EMEA, Chef 
mandi@getchef.com
whoami 
• Mandi Walls 
• Professional Services for Chef 
• @lnxchk
Why Design Reviews? 
• Your opportunity to get a look at an incoming project 
• Hopefully early enough to have influence! 
Talk to development before it’s too late to make changes to support Ops
What You Want 
• Maximize the conversation 
• Get a good feel for important details 
• Don’t get too in the weeds, focus on what alleviates the most 
headaches
Set Your Goals 
• Initial deployment of the application 
• Runtime management and day-to-day operations 
• Releases and upgrades 
• Handling Outages
Have A Checklist 
• Combination of global requirements and needs specific to the project 
• Doesn’t have to be exhaustive, or hit everything the first time 
• Sample: http://bit.ly/lnxchk_reviewcklist 
• Sets goals by layer 
• You can also set goals by activity 
• It’s a living document 
• Make changes as you learn, launch more projects
Some Application 
FE 
Web Tier 
LB 
App Tier 
External 
Services 
Data Tier BI
Investigating Layer by Layer 
• Different components will require different focus! 
• Some topics will hit all layers
Frontend Entry Point 
• To look at during deploy 
• Gathering all FQDNs 
• Determining SSL Requirements 
• Looking at Geographic Topologies
FE Gotchas: Rewrite Rules, Divided Farms 
• Sophisticated load balancing equipment allow for lots of cleverness 
• Documentation of intent is key 
• Later migrations, requirements changes are more difficult 
/foo 
/bar
Web Tier 
• Server type and version 
• Access methods to other tiers 
• Caching? 
• Locations 
• Modules and their configurations
Web Insanity: Custom HTTP Servers 
https://www.flickr.com/photos/16210667@N02/13973604274
Application Tier 
• Basics: server type and version, port requirements, protocols 
• Outbound connections to outside services 
• User affinity 
• Necessary libraries and their release cycles
App Tier Gotchas 
• Things that affect scaling, replacement of hosts 
• Whitelisting! Why!? 
• What changes require restarts? 
• New FE connections? New BE connections to cache? To DB?
Data Tier – Our Data 
• In multi-location topologies, where is the master? 
• Is there whitelisting or other app authorization? 
• What are compliance/reporting requirements? 
• SOX? HIPAA? PCI? 
• Will schemas be versioned?
Doing Horrible Things to Data 
• Premature optimization can hamstring a project 
• Are the data tiers susceptible to queuing? 
• Management of data should be automated
Data Tier – Imports and ETL 
• Schedules 
• Measuring successful load 
• Contact information for upstream provider 
• Storage locations and intended size
Operationalizing ETL 
• Some of the fiddliest fiddly bits 
https://www.flickr.com/photos/joming/2182022195
Data Tier – Exports, Reporting, BI 
• Packaging 
• Scheduling 
• Live replica versus full dump and export 
• Should have no impact on production operation of the datastore!
External APIs - Consumer 
• Relying on services you don’t own is tricky business 
• Account name and owner 
• SLA of service provider 
• Contact info for outages
Graceful Degradation 
• What does the App do if the external service is unavailable? 
• Shouldn’t hang 
• Should log some message 
• Should have some reasonable timeout, plus a hold down 
• Users shouldn’t notice
Red Button 
• Can you completely disconnect the code from a broken service? 
• Who can make the call? 
https://www.flickr.com/photos/bluetsunami/535781178
Gotchas for All Tiers 
• Host firewalls 
• Application user accounts – in the central service? 
• Logging policy 
• Metrics collection 
• Backups 
• OS Versions
Alerts 
• A stacktrace is not an alertable event 
• Log hygiene is important – what’s in the prod logs needs to be useful 
• If it’s worth writing to disk, someone should know what to do with it 
2014-06-01:16:44:00:UTC ERROR: $$$$$$$$ 
2014-06-01:16:44:07:UTC ERROR: Host Not Found
Releases 
• Schedule – Over night? Weekends? Rolling continuous deploy? 
• Getting operations requirements into Dev cycle 
• Security requirements 
• Upgrades and patches 
• Service restarts 
• Graceful horizontal restarts 
• Full-zone downtime
Nightmare: Attaching Prod App to Dev DB 
• All artifacts should ship production ready 
• Localized configurations ready in advance 
• Lean on config management tools for other environments 
https://www.flickr.com/photos/garon/121923087
Software Library 
• Set organizational standards 
• Rely on your OS vendors when possible 
• Pre-check that all dependencies are available in all repos for all 
environments 
• Suppress the urge to build special packages for absolutely 
everything 
• Streamlined, repeatable, reliable installs
Spelunking Through History 
• Keeping up with OS releases makes later scaling, replacement of 
dead hosts smoother 
• Always be current or no more than one release back in all envs
Performance and Tuning 
• Hard to be concise with new projects 
• Performance regression should be part of the testing process! 
• Know in advance what tunables exist, what their indicators are 
• Heap 
• GC 
• Network stacks 
• In-memory caching
Outage Planning 
• Known failure patterns and indicators 
• Who is oncall from Dev? 
• Who is oncall from Product or other decision-making stakeholders?
The Dark Art of Healthchecks 
• At some point, there should be a check that works back through the 
whole stack for status 
• These can be expensive 
• You have to know, when you hit the FE, that it is able to serve all the 
components of the app
Static Failover Pages 
• Mechanism for not serving up 500s or blanks during downtime 
• Set to non-cachable, show a picture, give a better experience
Other Team Info 
• Know who’s who 
• Dev teams should have some sort of oncall in case of problems Ops 
can’t fix 
• Intake process for bug reports, feedback, getting Ops issues fixed 
• Like those crazy log messages! 
• Get invited to team meetings! Go occasionally!
Success Metrics 
• What will success actually look like after launch? 
• Is the product team watching a particular set of KPIs? 
• How will the data for success be collected 
• BI, beacons, third party collectors
Things That are Challenging to Find in Review 
• Bottlenecks 
• That flat table-top graph isn’t a good thing 
• Impact of third-party inclusions 
• Rigor of the integration testing environment 
• What parts of the system users are really going to use 
• People might like something you thought was going to be minor
Learn From Experience 
• Over time, the infrastructure build out will get easier 
• Operations teams develop their own heuristics for difficulty 
• Always be learning and improving 
• Give everyone a chance to learn and improve too 
• Create standards so you don’t have to investigate all dark corners 
• Update your checklist periodically
Thanks!
Design Reviews for Operations - Velocity Europe 2014

More Related Content

What's hot

Change management in hybrid landscapes
Change management in hybrid landscapesChange management in hybrid landscapes
Change management in hybrid landscapes
Chris Kernaghan
 
VMworld 2013: Examining vSphere Design Through a Design Scenario
VMworld 2013: Examining vSphere Design Through a Design Scenario VMworld 2013: Examining vSphere Design Through a Design Scenario
VMworld 2013: Examining vSphere Design Through a Design Scenario
VMworld
 
Pain points of agile development
Pain points of agile developmentPain points of agile development
Pain points of agile developmentPerforce
 
Extreme Makeover OnBase Edition
Extreme Makeover OnBase EditionExtreme Makeover OnBase Edition
Extreme Makeover OnBase Edition
DataBank, A KYOCERA Group Company
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
Rohan Rasane
 
Operating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud MicroservicesOperating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud Microservices
Noriaki Tatsumi
 
Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail EarlySuccess recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail EarlyJoseph Vargheese PMP CSM CSP
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance Management
Noriaki Tatsumi
 
Agile Secure Cloud Application Development Management
Agile Secure Cloud Application Development ManagementAgile Secure Cloud Application Development Management
Agile Secure Cloud Application Development Management
Adam Getchell
 
SAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environmentSAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environment
Chris Kernaghan
 
Resolving problems & high availability
Resolving problems & high availabilityResolving problems & high availability
Resolving problems & high availability
Zend by Rogue Wave Software
 
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good ServerICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
Serdar Basegmez
 
DevOps for Database webinar
DevOps for Database webinarDevOps for Database webinar
DevOps for Database webinar
DBmaestro - Database DevOps
 
DevOps+Data: Working with Source Control
DevOps+Data: Working with Source ControlDevOps+Data: Working with Source Control
DevOps+Data: Working with Source Control
Ed Leighton-Dick
 
How Mature is Your Infrastructure?
How Mature is Your Infrastructure?How Mature is Your Infrastructure?
How Mature is Your Infrastructure?
Gary Stafford
 
SharePoint and the Lean Enterprise
SharePoint and the Lean EnterpriseSharePoint and the Lean Enterprise
SharePoint and the Lean Enterprise
Dave Healey
 
2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned
2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned
2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned
Eneko Jon Bilbao
 
PM, Scrum and TFS - Ivan Marković
PM, Scrum and TFS - Ivan MarkovićPM, Scrum and TFS - Ivan Marković
PM, Scrum and TFS - Ivan Marković
Software StartUp Academy Osijek
 
Challenges and best practices of database continuous delivery
Challenges and best practices of database continuous deliveryChallenges and best practices of database continuous delivery
Challenges and best practices of database continuous delivery
DBmaestro - Database DevOps
 
Go Faster with Lightning Process Builder
Go Faster with Lightning Process BuilderGo Faster with Lightning Process Builder
Go Faster with Lightning Process Builder
Salesforce Developers
 

What's hot (20)

Change management in hybrid landscapes
Change management in hybrid landscapesChange management in hybrid landscapes
Change management in hybrid landscapes
 
VMworld 2013: Examining vSphere Design Through a Design Scenario
VMworld 2013: Examining vSphere Design Through a Design Scenario VMworld 2013: Examining vSphere Design Through a Design Scenario
VMworld 2013: Examining vSphere Design Through a Design Scenario
 
Pain points of agile development
Pain points of agile developmentPain points of agile development
Pain points of agile development
 
Extreme Makeover OnBase Edition
Extreme Makeover OnBase EditionExtreme Makeover OnBase Edition
Extreme Makeover OnBase Edition
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
 
Operating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud MicroservicesOperating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud Microservices
 
Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail EarlySuccess recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance Management
 
Agile Secure Cloud Application Development Management
Agile Secure Cloud Application Development ManagementAgile Secure Cloud Application Development Management
Agile Secure Cloud Application Development Management
 
SAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environmentSAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environment
 
Resolving problems & high availability
Resolving problems & high availabilityResolving problems & high availability
Resolving problems & high availability
 
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good ServerICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
 
DevOps for Database webinar
DevOps for Database webinarDevOps for Database webinar
DevOps for Database webinar
 
DevOps+Data: Working with Source Control
DevOps+Data: Working with Source ControlDevOps+Data: Working with Source Control
DevOps+Data: Working with Source Control
 
How Mature is Your Infrastructure?
How Mature is Your Infrastructure?How Mature is Your Infrastructure?
How Mature is Your Infrastructure?
 
SharePoint and the Lean Enterprise
SharePoint and the Lean EnterpriseSharePoint and the Lean Enterprise
SharePoint and the Lean Enterprise
 
2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned
2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned
2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned
 
PM, Scrum and TFS - Ivan Marković
PM, Scrum and TFS - Ivan MarkovićPM, Scrum and TFS - Ivan Marković
PM, Scrum and TFS - Ivan Marković
 
Challenges and best practices of database continuous delivery
Challenges and best practices of database continuous deliveryChallenges and best practices of database continuous delivery
Challenges and best practices of database continuous delivery
 
Go Faster with Lightning Process Builder
Go Faster with Lightning Process BuilderGo Faster with Lightning Process Builder
Go Faster with Lightning Process Builder
 

Similar to Design Reviews for Operations - Velocity Europe 2014

When small problems become big problems
When small problems become big problemsWhen small problems become big problems
When small problems become big problems
Adrian Cole
 
Datapolis Guest Expert Presentation: Limitations of SharePoint Designer by Bj...
Datapolis Guest Expert Presentation: Limitations of SharePoint Designer by Bj...Datapolis Guest Expert Presentation: Limitations of SharePoint Designer by Bj...
Datapolis Guest Expert Presentation: Limitations of SharePoint Designer by Bj...
Datapolis
 
158 - Product Management for Enterprise-Grade platforms
158 - Product Management for Enterprise-Grade platforms 158 - Product Management for Enterprise-Grade platforms
158 - Product Management for Enterprise-Grade platforms
ProductCamp Boston
 
Project Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerProject Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on Docker
RightScale
 
Patching is Your Friend in the New World Order of EPM and ERP Cloud
Patching is Your Friend in the New World Order of EPM and ERP CloudPatching is Your Friend in the New World Order of EPM and ERP Cloud
Patching is Your Friend in the New World Order of EPM and ERP Cloud
Datavail
 
Lifecycle Management with SharePoint Apps and Solutions
Lifecycle Management with SharePoint Apps and SolutionsLifecycle Management with SharePoint Apps and Solutions
Lifecycle Management with SharePoint Apps and Solutions
SPC Adriatics
 
Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...
Andy Talbot
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
Rajesh P
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
Rajesh P
 
Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017
Chris Kernaghan
 
ALM with TFS: From the Drawing Board to the Cloud
ALM with TFS: From the Drawing Board to the CloudALM with TFS: From the Drawing Board to the Cloud
ALM with TFS: From the Drawing Board to the Cloud
Jeremy Likness
 
Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtime
DBmaestro - Database DevOps
 
Making the Transition from Suite to the Hub
Making the Transition from Suite to the HubMaking the Transition from Suite to the Hub
Making the Transition from Suite to the Hub
Black Duck by Synopsys
 
Keeping in Touch -- Collaborative Technologies
Keeping in Touch -- Collaborative TechnologiesKeeping in Touch -- Collaborative Technologies
Keeping in Touch -- Collaborative Technologies
IABC Houston
 
Mixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting exampleMixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting example
corehard_by
 
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
Eggplant
 
Building SharePoint Enterprise Platforms - Off the beaten path
Building SharePoint Enterprise Platforms - Off the beaten pathBuilding SharePoint Enterprise Platforms - Off the beaten path
Building SharePoint Enterprise Platforms - Off the beaten path
Andy Talbot
 
cloud session uklug
cloud session uklugcloud session uklug
cloud session uklugdominion
 
Transforming Enterprise Teams to DevOps Workflows
Transforming Enterprise Teams to DevOps WorkflowsTransforming Enterprise Teams to DevOps Workflows
Transforming Enterprise Teams to DevOps Workflows
Mandi Walls
 

Similar to Design Reviews for Operations - Velocity Europe 2014 (20)

When small problems become big problems
When small problems become big problemsWhen small problems become big problems
When small problems become big problems
 
Datapolis Guest Expert Presentation: Limitations of SharePoint Designer by Bj...
Datapolis Guest Expert Presentation: Limitations of SharePoint Designer by Bj...Datapolis Guest Expert Presentation: Limitations of SharePoint Designer by Bj...
Datapolis Guest Expert Presentation: Limitations of SharePoint Designer by Bj...
 
158 - Product Management for Enterprise-Grade platforms
158 - Product Management for Enterprise-Grade platforms 158 - Product Management for Enterprise-Grade platforms
158 - Product Management for Enterprise-Grade platforms
 
Project Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerProject Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on Docker
 
Patching is Your Friend in the New World Order of EPM and ERP Cloud
Patching is Your Friend in the New World Order of EPM and ERP CloudPatching is Your Friend in the New World Order of EPM and ERP Cloud
Patching is Your Friend in the New World Order of EPM and ERP Cloud
 
Lifecycle Management with SharePoint Apps and Solutions
Lifecycle Management with SharePoint Apps and SolutionsLifecycle Management with SharePoint Apps and Solutions
Lifecycle Management with SharePoint Apps and Solutions
 
Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
 
Software development life cycle
Software development life cycleSoftware development life cycle
Software development life cycle
 
Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017
 
ALM with TFS: From the Drawing Board to the Cloud
ALM with TFS: From the Drawing Board to the CloudALM with TFS: From the Drawing Board to the Cloud
ALM with TFS: From the Drawing Board to the Cloud
 
Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtime
 
Making the Transition from Suite to the Hub
Making the Transition from Suite to the HubMaking the Transition from Suite to the Hub
Making the Transition from Suite to the Hub
 
Keeping in Touch -- Collaborative Technologies
Keeping in Touch -- Collaborative TechnologiesKeeping in Touch -- Collaborative Technologies
Keeping in Touch -- Collaborative Technologies
 
Mixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting exampleMixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting example
 
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
 
Building SharePoint Enterprise Platforms - Off the beaten path
Building SharePoint Enterprise Platforms - Off the beaten pathBuilding SharePoint Enterprise Platforms - Off the beaten path
Building SharePoint Enterprise Platforms - Off the beaten path
 
cloud session uklug
cloud session uklugcloud session uklug
cloud session uklug
 
Transforming Enterprise Teams to DevOps Workflows
Transforming Enterprise Teams to DevOps WorkflowsTransforming Enterprise Teams to DevOps Workflows
Transforming Enterprise Teams to DevOps Workflows
 

More from Mandi Walls

DOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdfDOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdf
Mandi Walls
 
Addo reducing trauma in organizations with SLOs and chaos engineering
Addo  reducing trauma in organizations with SLOs and chaos engineeringAddo  reducing trauma in organizations with SLOs and chaos engineering
Addo reducing trauma in organizations with SLOs and chaos engineering
Mandi Walls
 
Full Service Ownership
Full Service OwnershipFull Service Ownership
Full Service Ownership
Mandi Walls
 
PagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call TeamsPagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call Teams
Mandi Walls
 
InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020
Mandi Walls
 
Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019
Mandi Walls
 
Using Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure SecurityUsing Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure Security
Mandi Walls
 
Adding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17xAdding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17x
Mandi Walls
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Mandi Walls
 
BuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec WorkshopBuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec Workshop
Mandi Walls
 
InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018
Mandi Walls
 
DevOpsDays InSpec Workshop
DevOpsDays InSpec WorkshopDevOpsDays InSpec Workshop
DevOpsDays InSpec Workshop
Mandi Walls
 
Adding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpecAdding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpec
Mandi Walls
 
InSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.beInSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.be
Mandi Walls
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
Mandi Walls
 
Ingite Slides for InSpec
Ingite Slides for InSpecIngite Slides for InSpec
Ingite Slides for InSpec
Mandi Walls
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
Mandi Walls
 
InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017
Mandi Walls
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
Mandi Walls
 
InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017
Mandi Walls
 

More from Mandi Walls (20)

DOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdfDOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdf
 
Addo reducing trauma in organizations with SLOs and chaos engineering
Addo  reducing trauma in organizations with SLOs and chaos engineeringAddo  reducing trauma in organizations with SLOs and chaos engineering
Addo reducing trauma in organizations with SLOs and chaos engineering
 
Full Service Ownership
Full Service OwnershipFull Service Ownership
Full Service Ownership
 
PagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call TeamsPagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call Teams
 
InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020
 
Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019
 
Using Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure SecurityUsing Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure Security
 
Adding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17xAdding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17x
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
 
BuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec WorkshopBuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec Workshop
 
InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018
 
DevOpsDays InSpec Workshop
DevOpsDays InSpec WorkshopDevOpsDays InSpec Workshop
DevOpsDays InSpec Workshop
 
Adding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpecAdding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpec
 
InSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.beInSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.be
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
 
Ingite Slides for InSpec
Ingite Slides for InSpecIngite Slides for InSpec
Ingite Slides for InSpec
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
 
InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
 
InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 

Design Reviews for Operations - Velocity Europe 2014

  • 1. Design Review Best Practices Velocity Barcelona 2014 Mandi Walls Consulting Director, EMEA, Chef mandi@getchef.com
  • 2. whoami • Mandi Walls • Professional Services for Chef • @lnxchk
  • 3. Why Design Reviews? • Your opportunity to get a look at an incoming project • Hopefully early enough to have influence! Talk to development before it’s too late to make changes to support Ops
  • 4. What You Want • Maximize the conversation • Get a good feel for important details • Don’t get too in the weeds, focus on what alleviates the most headaches
  • 5. Set Your Goals • Initial deployment of the application • Runtime management and day-to-day operations • Releases and upgrades • Handling Outages
  • 6. Have A Checklist • Combination of global requirements and needs specific to the project • Doesn’t have to be exhaustive, or hit everything the first time • Sample: http://bit.ly/lnxchk_reviewcklist • Sets goals by layer • You can also set goals by activity • It’s a living document • Make changes as you learn, launch more projects
  • 7. Some Application FE Web Tier LB App Tier External Services Data Tier BI
  • 8. Investigating Layer by Layer • Different components will require different focus! • Some topics will hit all layers
  • 9. Frontend Entry Point • To look at during deploy • Gathering all FQDNs • Determining SSL Requirements • Looking at Geographic Topologies
  • 10. FE Gotchas: Rewrite Rules, Divided Farms • Sophisticated load balancing equipment allow for lots of cleverness • Documentation of intent is key • Later migrations, requirements changes are more difficult /foo /bar
  • 11. Web Tier • Server type and version • Access methods to other tiers • Caching? • Locations • Modules and their configurations
  • 12. Web Insanity: Custom HTTP Servers https://www.flickr.com/photos/16210667@N02/13973604274
  • 13. Application Tier • Basics: server type and version, port requirements, protocols • Outbound connections to outside services • User affinity • Necessary libraries and their release cycles
  • 14. App Tier Gotchas • Things that affect scaling, replacement of hosts • Whitelisting! Why!? • What changes require restarts? • New FE connections? New BE connections to cache? To DB?
  • 15. Data Tier – Our Data • In multi-location topologies, where is the master? • Is there whitelisting or other app authorization? • What are compliance/reporting requirements? • SOX? HIPAA? PCI? • Will schemas be versioned?
  • 16. Doing Horrible Things to Data • Premature optimization can hamstring a project • Are the data tiers susceptible to queuing? • Management of data should be automated
  • 17. Data Tier – Imports and ETL • Schedules • Measuring successful load • Contact information for upstream provider • Storage locations and intended size
  • 18. Operationalizing ETL • Some of the fiddliest fiddly bits https://www.flickr.com/photos/joming/2182022195
  • 19. Data Tier – Exports, Reporting, BI • Packaging • Scheduling • Live replica versus full dump and export • Should have no impact on production operation of the datastore!
  • 20. External APIs - Consumer • Relying on services you don’t own is tricky business • Account name and owner • SLA of service provider • Contact info for outages
  • 21. Graceful Degradation • What does the App do if the external service is unavailable? • Shouldn’t hang • Should log some message • Should have some reasonable timeout, plus a hold down • Users shouldn’t notice
  • 22. Red Button • Can you completely disconnect the code from a broken service? • Who can make the call? https://www.flickr.com/photos/bluetsunami/535781178
  • 23. Gotchas for All Tiers • Host firewalls • Application user accounts – in the central service? • Logging policy • Metrics collection • Backups • OS Versions
  • 24. Alerts • A stacktrace is not an alertable event • Log hygiene is important – what’s in the prod logs needs to be useful • If it’s worth writing to disk, someone should know what to do with it 2014-06-01:16:44:00:UTC ERROR: $$$$$$$$ 2014-06-01:16:44:07:UTC ERROR: Host Not Found
  • 25. Releases • Schedule – Over night? Weekends? Rolling continuous deploy? • Getting operations requirements into Dev cycle • Security requirements • Upgrades and patches • Service restarts • Graceful horizontal restarts • Full-zone downtime
  • 26. Nightmare: Attaching Prod App to Dev DB • All artifacts should ship production ready • Localized configurations ready in advance • Lean on config management tools for other environments https://www.flickr.com/photos/garon/121923087
  • 27. Software Library • Set organizational standards • Rely on your OS vendors when possible • Pre-check that all dependencies are available in all repos for all environments • Suppress the urge to build special packages for absolutely everything • Streamlined, repeatable, reliable installs
  • 28. Spelunking Through History • Keeping up with OS releases makes later scaling, replacement of dead hosts smoother • Always be current or no more than one release back in all envs
  • 29. Performance and Tuning • Hard to be concise with new projects • Performance regression should be part of the testing process! • Know in advance what tunables exist, what their indicators are • Heap • GC • Network stacks • In-memory caching
  • 30. Outage Planning • Known failure patterns and indicators • Who is oncall from Dev? • Who is oncall from Product or other decision-making stakeholders?
  • 31. The Dark Art of Healthchecks • At some point, there should be a check that works back through the whole stack for status • These can be expensive • You have to know, when you hit the FE, that it is able to serve all the components of the app
  • 32. Static Failover Pages • Mechanism for not serving up 500s or blanks during downtime • Set to non-cachable, show a picture, give a better experience
  • 33. Other Team Info • Know who’s who • Dev teams should have some sort of oncall in case of problems Ops can’t fix • Intake process for bug reports, feedback, getting Ops issues fixed • Like those crazy log messages! • Get invited to team meetings! Go occasionally!
  • 34. Success Metrics • What will success actually look like after launch? • Is the product team watching a particular set of KPIs? • How will the data for success be collected • BI, beacons, third party collectors
  • 35. Things That are Challenging to Find in Review • Bottlenecks • That flat table-top graph isn’t a good thing • Impact of third-party inclusions • Rigor of the integration testing environment • What parts of the system users are really going to use • People might like something you thought was going to be minor
  • 36. Learn From Experience • Over time, the infrastructure build out will get easier • Operations teams develop their own heuristics for difficulty • Always be learning and improving • Give everyone a chance to learn and improve too • Create standards so you don’t have to investigate all dark corners • Update your checklist periodically

Editor's Notes

  1. There are a lot of tools that promise to help with ETL. Many of them produce some kind of job script or war file or similar that has no logging, or predictable error reporting, or clean stop/start. They may dissolve into chaos if the data is a little bit wonky. Complex pipelines of data transformations are very hard to test, replicate, and debug.
  2. Red buttoning is one of the easiest ways to control your application when an upstream service is misbehaving. The button is there to completely bypass any code reliant on the service, and continue the application flow. These buttons should be easy to manipulate, maybe in a CMS, or in a configuration file that is periodically re-read by the application. The intent is to not have to restart your service because someone else is having an outage. Red button should be a manual process; automatic graceful degradation with hold down will take care of transient errors. A semi-permanent use of a red button is when the upstream provider has changed their API, and your application has not been prepared; red button the feature until the app is fixed and updated.
  3. The last thing you want to do with a running application in distress is dig around hoping you have the software used to build the farm originally. Keep application platforms updated; don’t let any environments get behind.
  4. We’ve learned a lot about post mortems, and talking about failure in general, over the past couple of years. A design review gives you the opportunity to revisit lessons learned from your systems history and make sure the learnings from those activities get incorporated into new systems.