Humans by the hundred (DevOps Days Ohio)

•Download as PPTX, PDF•

0 likes•354 views

1) The document discusses how human fallibility and increased development teams can lead to an exponential decay in the probability of successful deployments over time as the number of changes increases. 2) It recommends approaches like implementing a service-oriented architecture, focusing on mean time to recovery rather than just preventing failures, and establishing "DevOps deputies" to help mitigate this problem. 3) The key message is that organizations should embrace DevOps culture, tailor their approaches to their specific teams, plan for inevitable changes over time, and view components as distributed from the beginning.

Humans By The Hundred
DevOps Days Ohio 2015

$ whoami
SRE Manager at Yelp
CWRU Alum
Pittsburgh native
<3 web operations
Just a dude

Yelp’s Mission:
Connecting people with great
local businesses.

Deployment: the early days
Get a few people together in slack/irc/etc.
Merge up the code
Run the tests
Poke at it in stage
Cross your fingers

Things get slower...
Tests take longer to run
More hosts = longer downloads, bounces
More developers = more eyeballs
More features = more code

The Problem: Humans Are Fallible
“…oh @$#&”

The Problem, With Math
Assume:
Every change has a chance of success: 98%
That means no test failures, no reverts, etc.
Every deploy has a number of changes: n
Any failure in the pipeline invalidates the deploy
Let’s figure out the probability of a
successful deployment: p

The Problem, With Math
Only you
p = 98%
You and a friend
p = .98 * .98 = 96%
You and nine co-workers
p = .98 * .98 * .98 * … * .98 = 82%

The Problem, With Math
p = (.98)n
exponential decay!

This doesn’t scale!
More developers = more changes
More changes = longer deploys
Longer deploys = less time to develop
Less time to develop = slower to iterate

Making it harder to screw up
Write more tests
Write better tests
Get better code reviews
Get better infrastructure
Switch programming languages
Use better tools

Just write better software and stop
making mistakes!

The Real World
Testing builds confidence in our changes
Testing does not protect you from failure
Better tools, tests, and infrastructure can
raise our success rates

Service-Oriented Architecture
Large monolith → smaller services
Services communicate over network
Usually HTTP, but you can do RPC, SOAP, etc.
Service = independent code base
Independent deployments

Service-Oriented Architecture
Benefits
Smaller code bases = upper bound to n
Failure domains become isolated
Technology independence
Federated responsibility

Service-Oriented Architecture
Drawbacks
everything becomes decoupled
function calls start looking like HTTP requests
versioning can be a nightmare
tracking dependencies is hard
data consistency becomes challenging
end-to-end testing becomes hard(er), if not
impossible

Conquering SOA
With the monolith, it’s easy to focus on
mean time between failures (MTBF)

Conquering SOA
In a SOA, focus on mean time to recovery
(MTTR)

Conquering SOA
Fail fast
Anticipate failure
Leverage iteration speed to recover fast

Conquering SOA
Treat everything as distributed
Pick a size that works for you
micro
macro
somewhere in between
Size doesn’t have to be uniform!

Ops Deputies
Developers ‘deputized’ to do operations
Elevated privileges
Tackle infrastructure needs for their team
Contribute improvements to shared infra
Become first-hop for operations questions

Glue
Define interfaces between components
Makes it easy to swap out later
Expect change
Minimize code, keep it simple
Think about EOL when you start
Your company is changing, and so will your needs

Embrace DevOps
DevOps culture, not just the technology!

Do what’s right for you
Don’t let dogma rule! Tailor your approaches to the
talent around you.

Plan for change
Bit rot is real. Plan how you’re going to deal with it!

@YelpEngineering
YelpEngineers
engineeringblog.yelp.com
github.com/yelp

This document discusses how scaling teams to support big data growth at Yelp can negatively impact deployment speed due to an exponential increase in the probability of failures as the number of developers increases. It proposes service-oriented architecture and focusing on mean time to recovery rather than just preventing failures as ways to mitigate these risks and maintain rapid iteration. Continuous delivery, reliable but not exhaustive testing, and treating all processes as distributed are also recommended to support scaling teams while preserving deployment speed.

Yelp Tech Talks: Mobile Testing 1, 2, 3

Yelp Engineering

This document discusses testing approaches for mobile applications at Yelp. It describes how Yelp uses documentation, mocks, and JSON to test new mobile APIs on iOS and Android. It outlines the different types of tests Yelp employs, including unit tests, integration tests, and UI tests. For iOS, it details how KIF is used for integration testing. For Android, it explains how Espresso is used for UI tests along with techniques for addressing flakiness and speeding up test runs.

Microservices Summit - The Human Side of Services

Yelp Engineering

This document discusses the importance of the human side of services at Yelp and outlines several best practices for operating microservices. It emphasizes the importance of clear organizational objectives, service level agreements to define key metrics like performance and reliability, emphasis on cooperation between teams with varied skills, and avoiding common pitfalls like depending too heavily on libraries or assuming all developers can do operations work. The overall message is that successful microservices require both technical excellence and a focus on the human factors of collaboration, clear goals, and ownership.

One trunk one pipeline one truth

Paul Boocock

The document discusses deploying code changes quickly through continuous integration, delivery, and deployment practices. It advocates for having a single code repository ("one trunk"), automated builds and testing on every commit, deploying to staging frequently, and aiming to deploy changes within hours or less. This allows for early feedback, catching issues early, and responding quickly to change requests from customers through incremental and frequent deliveries.

WinOps Conf 2016 - Gael Colas - Configuration Management Theory: Why Idempote...

WinOps Conf

Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote

Gwen (Chen) Shapira

The document discusses lies that architects sometimes tell and truths they avoid. It provides examples of six common lies: 1) saying a system is real-time or has big data when it really has specific requirements, 2) claiming a microservices architecture exists when the goal is still to migrate, 3) saying hybrid/multi-cloud architectures don't exist when the architecture is just copy-pasted, 4) using "best of breed" when really using only one of everything, 5) claiming something can't be done at an organization due to its nature when other similar organizations succeeded, and 6) avoiding risk or change by safely interpreting things in a non-threatening way. The document advocates defining responsibilities clearly, embracing change, taking measured

Doing monitoring right

John-Daniel Trask

SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub

DevOpsDays Tel Aviv

One of the hardest challenges data teams face today is selecting which tools to use in their workflow. Marketing messages are vague, and you continuously hear of new buzzwords you “just have to have in your stack”. There is a constant stream of new tools, open-source and proprietary that make buyer’s remorse especially bad. I call it “MLOps Fatigue”. This talk will not discuss a specific MLOps tool, but instead present guidelines and mental models for how to think about the problems you and your team are facing, and how to select the best tools for the task. We will review a few example problems, analyze them, and suggest Open Source solutions for them. We will provide a mental framework that will help tackle future problems you might face and extract the concrete value each tool provides. What you’ll learn You’ll learn what signals to watch for to notice you might have MLOps fatigue. How to define the challenge you’re facing and which questions to ask in order to build a “decision tree” for selecting the best-suited tools for the task. A few examples for using this framework in practice on challenges involving data management and automating training/pipeline tasks About 2 years ago we faced a crisis in our DevOps consulting company - the market demand was higher than we could supply. The traditional recruiting process depending on CV and artificial credentials was not working. So we came up with an alternative solution, and since then - we are growing exponentially and diversely. In this talk we will show the practical tools we deployed in order to increase our capacity, and we will show how these tools overcome the inherited bias in the process.

‘Tis the Season – Holiday 2014 eCommerce Quality Checklist Past Webinar Archived (originally presented June 26th, 2014) This year, your holiday traffic will increase 15% or more, and 50% of the users will be mobile. Recent research shows 71% of your revenue comes from multi-channel users, so if you haven’t started planning, you’re already behind. Leading retailers are preparing for Holiday “14 and testing their production sites for multi-channel access to 115% capacity, or beyond! If you’re not one of them, your plans are incomplete. Cover your risks. Join Tenzing and SOASTA experts as they discuss the must-do checklist for peak performance. In this webinar you’ll learn: Align your Marketing and Quality plans Cover the multichannel user experience Test early in the lab and fully in production Optimize end-to-end site speed and performance When to freeze for the winter Don’t miss this opportunity to “shop early” and see how the leading retailers are already beating the odds with cloud testing.

How to measure the business impact of web performance

SOASTA

If your site were one second slower, how many of your visitors would bounce? If your site were one second faster, how many additional orders would you receive? Bottom line: Do you know what one second of latency is worth to your business? Traditional approaches to performance monitoring are fatally flawed. They measure performance only in a silo, telling you how long key actions took but not putting that information into a context you can use to improve the one metric that ultimately matters: revenue. Bridging this gap requires the collection of performance and business data together, and then analyzing this data using the proper analytic methods. Using modern Real User Monitoring (RUM) techniques, Buddy Brewer will show you how to quantify the impact even one second of latency has on key business metrics like bounce and conversion rate.

Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019

Codemotion

Coolblue is a proud Dutch company, with a large internal development department; one that truly takes CI/CD to heart. Empowerment through automation is at the heart of these development teams, and with more than 1000 deployments a day, we think it's working out quite well. In this session, Pat Hermens (a Development Managers) will step you through what enables us to move so quickly, which tools we use, and most importantly, the mindset that is required to enable development teams to deliver at such a rapid pace.

SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io

DevOpsDays Tel Aviv

In every development process there is the question, do we invest enough on quality? Do we need to invest more? Every team knows about the dilemma of how many tests is the right amount of tests we should write. Is 80% test coverage is good enough? Maybe 90%? 100%? Should we invest more time in unit testing? Are we wasting too much time on unit-testing? Should we invest time on a faster rollback mechanism? WIIFM “Without data, you’re just another person with an opinion” - W. Edwards Deming SLO Driven Development is a framework that helps the developers focus on impact and balance of every aspect of the dev process. When working currently with SLI, SLA, SLO and error budget you can learn where to invest in the development process. Let’s talk about the importance of good SLOs and how they can help us improve our day2day

DevOPs Transformation Workshop

Jules Pierre-Louis

This document outlines the agenda and objectives for a DevOps transformation workshop. The workshop will cover DevOps foundations, including value stream mapping exercises. It will define DevOps and discuss how to map the current software delivery lifecycle. Key aspects like cycle time, bottlenecks, wait times and processing times will be examined. The workshop aims to help organizations identify inefficiencies and develop future state solutions to reduce cycle times and implement DevOps best practices.

Test driven, why not?

irenella89

Living with acceptance tests: Beyond Write-Once (XP NYC)

Daniel Wellman

The document discusses strategies for maintaining acceptance tests over time beyond just writing them once. It suggests treating acceptance tests like production code by revising and refactoring them as the system evolves. Tests should describe what is being tested in a declarative style rather than specifying implementation details so their implementation can change. As a project and domain maturity, acceptance tests may move from end-to-end implementations to using the domain model directly to run faster. The test suite should also be periodically evaluated to determine if restructuring or new tests are needed.

Shawn Wallace - Test automation in brownfield applications

QA or the Highway

The document discusses test automation in brownfield applications. Some key challenges mentioned include code that is difficult to test, engineering challenges, infrastructure challenges, and dealing with large volumes of test data. Meaningful code coverage through testing is also noted as a challenge. The document advocates testing key use cases, defects, and new features as a way to start with automation. It also emphasizes building quality systems that provide value.

AB Testing at Expedia

Paul Lucas

This document describes the evolution of Expedia's Test and Learn (TNL) platform over three versions (V0, V1, V2). It discusses problems with earlier versions like long query times and increasing data size. The new V2 architecture introduced Kafka for messaging, Cassandra for the database, and Column-oriented storage for improved scalability, responsiveness, and fault tolerance. Lessons learned include ensuring each system component can handle the workload and having pre-production environments to test changes.

Why Your Selenium Tests are so Dang Brittle, and What to Do About It

Jay Aho

This document discusses strategies for making Selenium web application GUI tests less brittle. It recommends writing fewer GUI tests and more unit and integration tests. For the GUI tests that are written, it suggests hand coding the tests in Java rather than recording and playback, using object oriented design principles, keeping test framework code dry while allowing test code to be wetter, and using element locators like ID attributes and xpath as a last resort. The document then provides examples of code that implements these strategies.

Performance: Key Elements to Consider in the Cloud - RightScale Compute 2013

RightScale

Speaker: Craig Irwin - VP of Channel Partners & Alliances, Apica Everyone thinks the cloud is the silver bullet, however, this isn’t reality. From the latest online political movements to the next viral game to the much anticipated retail promotion, all share elements in common: cloud, competition, performance, experience, and cost. Apica VP Craig Irwin will present key strategic elements employed by today’s progressive and innovative companies and share actionable insights on how companies are leveraging technology to proactively identify bottlenecks, improve performance, and optimize their environments. Craig will touch on the common mistakes, present-day situations that hit the headlines, and best practices to maintain optimal web performance and avoid system crashes.

London web perfug_performancefocused_devops_feb2014

Andreas Grabner

The document discusses best practices for performance-focused DevOps including metrics for measuring performance throughout the development and deployment process. It provides examples of companies that deploy software frequently and with few errors and outlines the importance of testing, monitoring and addressing performance issues. The document advocates taking a data-driven approach to identifying and resolving problems in order to improve development efficiency and software quality.

Continuous Deployment

Brian Henerey

Continuous Deployment involves shipping code as frequently as possible, even multiple times per day. It allows for smaller changes with less risk, faster feedback, and a competitive advantage. To achieve this, companies optimize their deployment process, automate testing and deployments, and measure everything to learn and improve continuously. This approach is enabled by technologies like cloud computing and embraced by companies like Google, Amazon, and Facebook.

Production is a bitch

John Barton

The document discusses challenges with software deployments and provides advice for improving deployments. It notes that deployments can go well or poorly, with good deployments relying on developing with the right mindset and technique. The mindset of always expecting problems is important to avoid being surprised. Code should fail loudly and fast to catch issues early. Metrics like response times should be monitored to identify scaling problems.

Agile xp crash_course_2010_05_21

Balint Erdi

This document provides an overview of Agile and eXtreme Programming (XP), including the core values and principles of the Agile Manifesto, roles and practices in XP, and how to adopt and apply XP to projects. It discusses key aspects of XP like short iterations, user stories, frequent releases, test-driven development, refactoring, collective code ownership, and more. The goal is to give the reader a crash course in Agile and XP methodologies.

Escaping the Pitfalls of Software Product Development

Mike Clement

Debugging Field Failures by Minimizing Captured Executions (ICSE 2009: NIER e...

James Clause

The document presents an approach called ADDA for assisting the debugging of field failures in deployed applications. ADDA works by recording executions in the field, replaying them in the lab, and then minimizing the recordings to focus debugging efforts. The approach was evaluated on real failures from the Pine email client. The results showed that ADDA could minimize recordings to around 10% of the original size while still allowing developers to debug the failures. The overhead of ADDA during recording was found to be negligible, and the offline minimization process took less than 75 minutes.

Introduction to Test Driven Development

Sarah Dutkiewicz

Software testing presentation

Nikolas Vourlakis

Telling Tales and Solving Crimes with New Relic

James Ford

New Relic provides application performance monitoring that enables a company to quickly detect and resolve issues with their website. This is demonstrated through two case studies: In the first case, New Relic detected JavaScript errors on the live site caused by some corporate networks blocking content from CDNs. This allowed the issue to be identified and resolved without any user reports. In the second case, New Relic provided advance warning of high server load before a crash occurred. This allowed the DevOps team to begin resolving the issue before any downtime. When the server did crash, New Relic instantly alerted support. The server was rebooted within 29 minutes, resolving the issue faster than arranging meetings about the downtime.

Lessons from DevOps: Taking DevOps practices into your AppSec Life

Matt Tesauro

Bruce Lee once said “Don’t get set into one form, adapt it and build your own, and let it grow, be like water“. AppSec needs to look beyond itself for answers to solving problems since we live in a world of every increasing numbers of apps. Technology and apps have invaded our lives, so how to you lead a security counter-insurgency? One way is to look at the key tenants of DevOps and apply those that make sense to your approach to AppSec. Something has to change as the application landscape is already changing around us.

Prometheus - Open Source Forum Japan

Brian Brazil

Prometheus is a next-generation monitoring system. It lets you see you not just what your systems look like from the outside, but also gives visibility into the internals and business aspects of your systems. This allows everyone to benefit, including both operations and developers. This talk will look at the concepts behind monitoring with Prometheus, how it's designed, why it's suitable for Cloud Native environments and how you can get involved.

What's hot

Tis The Season: Load Testing Tips and Checklist for Retail Seasonal Readiness

SOASTA

How to measure the business impact of web performance

SOASTA

Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019

Codemotion

SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io

DevOpsDays Tel Aviv

DevOPs Transformation Workshop

Jules Pierre-Louis

Test driven, why not?

irenella89

Living with acceptance tests: Beyond Write-Once (XP NYC)

Daniel Wellman

Shawn Wallace - Test automation in brownfield applications

QA or the Highway

AB Testing at Expedia

Paul Lucas

Why Your Selenium Tests are so Dang Brittle, and What to Do About It

Jay Aho

Performance: Key Elements to Consider in the Cloud - RightScale Compute 2013

RightScale

London web perfug_performancefocused_devops_feb2014

Andreas Grabner

Continuous Deployment

Brian Henerey

Production is a bitch

John Barton

Agile xp crash_course_2010_05_21

Balint Erdi

Escaping the Pitfalls of Software Product Development

Mike Clement

Debugging Field Failures by Minimizing Captured Executions (ICSE 2009: NIER e...

James Clause

Introduction to Test Driven Development

Sarah Dutkiewicz

Software testing presentation

Nikolas Vourlakis

Telling Tales and Solving Crimes with New Relic

James Ford

What's hot (20)

Tis The Season: Load Testing Tips and Checklist for Retail Seasonal Readiness

How to measure the business impact of web performance

Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019

SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io

DevOPs Transformation Workshop

Test driven, why not?

Living with acceptance tests: Beyond Write-Once (XP NYC)

Shawn Wallace - Test automation in brownfield applications

AB Testing at Expedia

Why Your Selenium Tests are so Dang Brittle, and What to Do About It

Performance: Key Elements to Consider in the Cloud - RightScale Compute 2013

London web perfug_performancefocused_devops_feb2014

Continuous Deployment

Production is a bitch

Agile xp crash_course_2010_05_21

Escaping the Pitfalls of Software Product Development

Debugging Field Failures by Minimizing Captured Executions (ICSE 2009: NIER e...

Introduction to Test Driven Development

Software testing presentation

Telling Tales and Solving Crimes with New Relic

Similar to Humans by the hundred (DevOps Days Ohio)

Lessons from DevOps: Taking DevOps practices into your AppSec Life

Matt Tesauro

Prometheus - Open Source Forum Japan

Brian Brazil

S.R.E - create ultra-scalable and highly reliable systems

Ricardo Amaro

Site Reliability Engineering enables agility and stability. SREs use Software Engineering to automate themselves out of the Job. My advice, if you want to implement this change in your company is to start with action items, alter your training and hiring, implement error budgets, do blameless postmortems and reduce toil. https://events.drupal.org/dublin2016/sessions/sre-create-ultra-scalable-and-highly-reliable-systems

Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...

Matt Tesauro

CONFidence 2015: Lessons from DevOps: Taking DevOps practices into your AppSe...

PROIDEA

Matt Tesauro presented on applying DevOps practices to application security. He discussed how traditional software development left little time for security testing. DevOps, Agile, and continuous delivery further squeeze testing windows. The solution is automated security testing integrated into software pipelines. Tesauro outlined key features of application security pipelines like iterative improvement, reusable processes, and a focus on automation to optimize security resources. Pipelines improve visibility, consistency, and flow of security work.

How to Effectively Migrate Data From Legacy Apps

CloverDX

** Watch the webinar to accompany these slides: https://www.cloverdx.com/webinars/how-to-effectively-migrate-data-from-legacy-system ** TIPS FOR PLANNING A DATA MIGRATION Old HCM, ERP or CRM systems are often business critical since they are ingrained into many processes within a company. But their age often means that the knowledge about how they work is mostly lost and it can be daunting to replace them with something newer and more streamlined. We'll show you some tips and best practices to help you migrate from a legacy system in a stress-free way. More CloverDX webinars: https://www.cloverdx.com/webinars Twitter: https://twitter.com/cloverdx LinkedIn: https://www.linkedin.com/company/cloverdx/ Get a free 45 day trial of the CloverDX Data Management Platform: https://www.cloverdx.com/trial-platform

Toc Education

teddalexander

The document discusses using Critical Chain Project Management (CCPM) to better manage project risk and variation. It explains that CCPM uses buffers and the theory of constraints to protect project due dates from the effects of variation. Projects are scheduled using a critical chain approach which identifies the critical chain of tasks and calculates buffers based on statistical analysis of variation. This allows the project schedule to function as a predictive model and maintain protection ratios to keep customers insulated from impacts of variation.

DevOps Roadtrip Final Speaking Deck

VictorOps

Software Testing in a Digital Transformation Journey

Alan Cafruni Gularte

In this new area of digital transformations, testers are seeing a sea of changes on how they traditionally tests software in terms of skills, process and tools. Machine Learning/AI, devops, agile and continuous testing are disrupting the testing community, and forcing them to find ways to test faster and with excellence. In this lecture, Alan will share how to overcome the main challenges in automation and performance testing in this context, making the best use of technology such as bots and test tools.

DevOps Game at SGZA

Dana Pylayeva

The document describes a role playing game to simulate a DevOps transformation. Players take on roles like developer, operations, business owner, etc. The game is played over 3 sprints where the teams start with separate development and operations teams, then integrate operations into development teams, and finally work to continuously deliver value through smaller batches and automated deployments. Bottlenecks are addressed and teams work to improve collaboration and feedback loops. The purpose is to demonstrate how DevOps principles can help organizations increase speed, stability, and innovation.

From Monoliths to Microservices at Realestate.com.au

evanbottcher

Prometheus (Prometheus London, 2016)

Brian Brazil

Brian Brazil is an engineer passionate about reliable software operations. He worked at Google SRE for 7 years and is the founder of Prometheus, an open source time series database designed for monitoring system and service metrics. Prometheus supports metric labeling, unified alerting and graphing, and is efficient, decentralized, reliable, and opinionated in how it encourages good monitoring practices.

The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...

eZ Systems

Kieron Sambrook-Smith, Chief Commercial Officer at Platform.sh spoke at eZ Conference 2017 in London about the business value of Platform as a Service (PaaS) Automation. He covers the many aspects of the advantages of using a PaaS. The business value you can expect to reap will range from hosting cost savings, better workflow and team productivity, new project delivery concepts, and greater competitive advantage. Discover a more advanced implementation of your service offering.

Tef con2016 (1)

ggarber

This document discusses best practices for inter-process communication in microservices architectures. It covers various options for synchronous and asynchronous communication between services including RPC, publish/subscribe, and request/response patterns. It also discusses service discovery, load balancing, serialization formats, transport protocols, failure handling techniques like circuit breakers and bulkheads, monitoring, and debugging distributed requests across microservices.

Dev ops

Eslam El Husseiny

DevOps aims to bridge the gap between development and operations by fostering collaboration. Key aspects of DevOps include establishing a collaborative culture through open communication and engagement between teams, automating processes like builds, deployments, testing and system configuration, and implementing monitoring of applications and infrastructure through metrics and logging to ensure stability and enable issues to be quickly identified and addressed. Tools like Puppet, Munin, Graphite, Logstash and Graylog can help operationalize these aspects of DevOps.

Performance Forensics - Understanding Application Performance

Alois Reitbauer

This document discusses performance forensics and optimization techniques. It emphasizes the importance of collecting multi-layered measurements from the user level down to the system level to understand performance problems. Common measurements include response time, memory usage, CPU usage, database queries and latency. Identifying the problem area and isolating it is key before applying optimizations like caching, reducing interactions and data locality. Tuning may be needed at the application, web or database layers. The goal is to make problems reproducible and ensure optimizations address the underlying issues rather than just symptoms.

DOES16 London - Better Faster Cheaper .. How?

John Willis

This document discusses how to achieve better, faster, and cheaper outcomes through DevOps practices. It argues that high-performing organizations deploy software 30x to 200x more frequently with 60x to 168x higher success rates compared to average performers. The document outlines several strategies to achieve these outcomes, including: establishing a culture of collaboration between Dev and Ops; automating processes; measuring outcomes; and promoting sharing of knowledge. It also discusses adopting service-aligned delivery teams, building everything through a standardized software development lifecycle (SDLC), making work visible, using immutable infrastructure, developing using a microservices architecture, and respecting people. The overall message is that DevOps practices can enable organizations to deliver value faster at higher quality and

Understanding DevOps

Mathieu Mailhos

This document provides an overview of DevOps and what is needed to succeed as a DevOps engineer. It discusses the DevOps ideology of collaboration between development, QA, and operations teams. It also outlines skills needed like infrastructure design, monitoring, automation, and an emphasis on continuous learning. The document recommends gaining experience in cloud platforms, databases, tools, CI/CD, development languages, and provisioning to get started in DevOps.

DevOps 101

Ernest Mueller

This document provides an overview of DevOps concepts and practices. It defines DevOps as development and operations engineers collaborating throughout the entire service lifecycle, from design to production support. Key principles discussed include automating infrastructure, measuring everything, and fostering a culture of collaboration between teams. The document outlines DevOps practices like continuous integration/delivery and monitoring, and provides checklists for starting a DevOps initiative at both the grassroots and management levels.

Dev ops concept

Professional Guru

Similar to Humans by the hundred (DevOps Days Ohio) (20)

Lessons from DevOps: Taking DevOps practices into your AppSec Life

Prometheus - Open Source Forum Japan

S.R.E - create ultra-scalable and highly reliable systems

Matt tesauro Lessons from DevOps: Taking DevOps practices into your AppSec Li...

CONFidence 2015: Lessons from DevOps: Taking DevOps practices into your AppSe...

How to Effectively Migrate Data From Legacy Apps

Toc Education

DevOps Roadtrip Final Speaking Deck

Software Testing in a Digital Transformation Journey

DevOps Game at SGZA

From Monoliths to Microservices at Realestate.com.au

Prometheus (Prometheus London, 2016)

The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...

Tef con2016 (1)

Dev ops

Performance Forensics - Understanding Application Performance

DOES16 London - Better Faster Cheaper .. How?

Understanding DevOps

DevOps 101

Dev ops concept

More from Yelp Engineering

Human Ops

Yelp Engineering

This document discusses ways to improve collaboration between operations and development teams. It provides examples of past issues, such as developers filling up disk space and running long queries that impacted operations staff. Potential solutions discussed include setting expectations, increasing automation, ensuring the right people are involved, and using metrics to track progress. The overall message is that both teams want to do the right thing, so processes and tools should be designed to make the right choices as easy as possible.

Teeing Up Python - Code Golf

Yelp Engineering

This document summarizes Lee Sheng's presentation on code golfing and writing concise code. Some key points discussed include minimizing the number of characters or "strokes" in code, using features like defaultdicts, context managers, and list comprehensions to reduce strokes, and focusing on functional programming techniques like map, filter, and reduce to write shorter code that is still readable. The presentation emphasizes that writing concise code reduces cognitive load while still expressing the intent clearly.

Fluxx Streaming

Yelp Engineering

Fluxx is a card game where the rules and goals can change throughout the game based on cards drawn. The document discusses logging best practices and architectures for aggregating logs from thousands of hosts, including using Apache Kafka and centralized logging services like Facebook Scribe. It emphasizes building logging systems incrementally and being open to replacing older solutions as needs change.

Giving Design Critique

Yelp Engineering

This document provides guidance on how to conduct effective design critiques in 3-4 sentences. It establishes that critiques should have clear roles for the presenter, audience, and facilitator. The feedback session should focus all participants on understanding the problem at hand before providing feedback. Constructive feedback should ask questions, build upon the design, and remain objective rather than being personal or critical. Laptops and phones should remain closed during the critique.

Building a World Class Security Team

Yelp Engineering

Michael Stoppelman, SVP of Engineering at Yelp, discussed building a world-class security team over time through hiring and focusing on security basics and getting professional. He described Yelp's early experiences without strong security protections, hiring their first security head in 2011, and implementing two-factor authentication and default cross-site scripting protection. Stoppelman outlined their efforts to strengthen corporate security through malware detection, encryption, and auditing and app security such improving access controls and credential management.

Ensuring Consistency in a Replicated World

Yelp Engineering

This document discusses Yelp's approach to ensuring data consistency across replicated databases. Key aspects include: - Using a "dirty session cookie" to route user requests to data centers with recent updates. - A "repl_delay_reporter" tool that measures replication lag between data centers to determine when updates have propagated. - "Heartbeat" checks that insert test data on masters and measure replication time to slaves to estimate replication delay.

A Beginners Guide To Launching Yelp In Hong Kong

Yelp Engineering

MySQL At Yelp

Yelp Engineering

This document discusses MySQL and how it is used at Yelp. It provides an overview of MySQL's history and features. It then describes how Yelp uses over 100 MySQL servers with InnoDB and replication. Yelp utilizes tools like Puppet, Nagios, Ganglia, and Percona Toolkit to manage and monitor their MySQL infrastructure. The document also provides tips for using MySQL for new and existing projects, including suggestions for troubleshooting, backups, and community resources.

Own Your Career

Yelp Engineering

Tasneem Minadakis worked as a software engineer, technical lead, product manager, and engineering manager at Microsoft for 9 years before becoming an engineering manager at Yelp for 2 months. She recommends understanding the company's big picture and vision, being curious by asking questions until fully understanding, focusing on strengths and filling time with enjoyed activities, being uncomfortable, fighting impostor syndrome, finding mentors, and asking questions about careers.

Scaling Traffic from 0 to 139 Million Unique Visitors

Yelp Engineering

This document summarizes the traffic history and infrastructure changes at Yelp from 2005 to the present. It outlines the key milestones and technology changes over time as Yelp grew from handling around 200k searches per day with 1 database in 2005-2007 to serving traffic across 29 countries in 2014 with a distributed, scalable infrastructure utilizing technologies like Elasticsearch, Kafka, and Pyleus for real-time processing.

Optimal Learning for Fun and Profit with MOE

Yelp Engineering

Abstract: In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system's parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system's click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment. MOE is ideal for problems in which the optimization problem's objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples. Scott Clark Bio: After finishing my PhD in Applied Mathematics at Cornell University in 2012 I have been working on the Ad Targeting team at Yelp Inc. I've been employing a variety of machine learning and optimization techniques from multi-armed bandits to Bayesian Global Optimization and beyond to their vast dataset and problems. I have also been trying to lead the charge on academic research and outreach within Yelp by leading projects like the Yelp Dataset Challenge and open sourcing MOE.

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...

Yelp Engineering

The document discusses using ElasticSearch to enable fast and scalable search of reviews. It describes how ElasticSearch allows for tokenization, stemming, stop words removal and faceting to improve search performance compared to a basic SQL search. An example query and response show how ElasticSearch returns search results and highlights matching text. The document also briefly outlines how data could be indexed in ElasticSearch through a queueing system and how shards and replicas can provide replication and scalability. It closes by noting some potential performance issues to be aware of with ElasticSearch.

"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...

Yelp Engineering

Scott Clark gave a presentation on optimal learning techniques. He discussed multi-armed bandits, which address the challenge of collecting information efficiently from multiple options with unknown outcomes. He provided an example of exploring various slot machines to maximize rewards. Clark also discussed Bayesian global optimization and Yelp's Metrics Optimization Engine (MOE), which uses Gaussian processes to suggest optimal parameters for A/B tests based on past experiment results, in order to efficiently optimize metrics. MOE is now being used in Yelp's live experiments to continuously improve performance.

More from Yelp Engineering (13)

Human Ops

Teeing Up Python - Code Golf

Fluxx Streaming

Giving Design Critique

Building a World Class Security Team

Ensuring Consistency in a Replicated World

A Beginners Guide To Launching Yelp In Hong Kong

MySQL At Yelp

Own Your Career

Scaling Traffic from 0 to 139 Million Unique Visitors

Optimal Learning for Fun and Profit with MOE

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...

"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...

Recently uploaded

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Neo4j

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence

IndexBug

Presentation of the OECD Artificial Intelligence Review of Germany

innovationoecd

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?

Speck&Tech

ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune. Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile. BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

SOFTTECHHUB

As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.

HCL Notes and Domino License Cost Reduction in the World of DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/ The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this! We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model. Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward. These topics will be covered - Reducing license cost by finding and fixing misconfigurations and superfluous accounts - How do CCB and CCX licenses really work? - Understanding the DLAU tool and how to best utilize it - Tips for common problem areas, like team mailboxes, functional/test users, etc - Practical examples and best practices to implement right away

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

Best 20 SEO Techniques To Improve Website Visibility In SERP

Pixlogix Infotech

How to Get CNIC Information System with Paksim Ga.pptx

danishmna97

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/ DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen! Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell. Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten. Diese Themen werden behandelt - Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten - Wie funktionieren CCB- und CCX-Lizenzen wirklich? - Verstehen des DLAU-Tools und wie man es am besten nutzt - Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw. - Praxisbeispiele und Best Practices zum sofortigen Umsetzen

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Malak Abu Hammad

Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers: * What is Vector Search? * Importance and benefits of vector search * Practical use cases across various industries * Step-by-step implementation guide * Live demos with code snippets * Enhancing LLM capabilities with vector search * Best practices and optimization strategies Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications. #MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology

UiPath Test Automation using UiPath Test Suite series, part 6

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI. UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities. Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes. What will you get from this session? 1. Insights into integrating generative AI. 2. Understanding how this integration enhances test automation within the UiPath platform 3. Practical demonstrations 4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath Topics covered: What is generative AI Test Automation with generative AI and Open AI. UiPath integration with generative AI Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

shyamraj55

Driving Business Innovation: Latest Generative AI Advancements & Success Story

Safe Software

Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency. During the hour, we’ll take you through: Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board. Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes. Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI. We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI. This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!

TrustArc Webinar - 2024 Global Privacy Survey

TrustArc

How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024? In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores. See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe. This webinar will review: - The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey - The top challenges for privacy leaders, practitioners, and organizations in 2024 - Key themes to consider in developing and maintaining your privacy program

Serial Arm Control in Real Time Presentation

tolgahangng

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

Recently uploaded (20)

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence

Presentation of the OECD Artificial Intelligence Review of Germany

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

HCL Notes and Domino License Cost Reduction in the World of DLAU

Removing Uninteresting Bytes in Software Fuzzing

Video Streaming: Then, Now, and in the Future

Best 20 SEO Techniques To Improve Website Visibility In SERP

How to Get CNIC Information System with Paksim Ga.pptx

Uni Systems Copilot event_05062024_C.Vlachos.pdf

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

UiPath Test Automation using UiPath Test Suite series, part 6

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

Driving Business Innovation: Latest Generative AI Advancements & Success Story

TrustArc Webinar - 2024 Global Privacy Survey

Serial Arm Control in Real Time Presentation

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Humans by the hundred (DevOps Days Ohio)

1. Humans By The Hundred DevOps Days Ohio 2015

2. $ whoami SRE Manager at Yelp CWRU Alum Pittsburgh native <3 web operations Just a dude

3. Yelp’s Mission: Connecting people with great local businesses.

4. Yelp Stats: As of Q3 2015 89M 3271%90M

5. Growth

9. Growth means embracing change.

10. Growth means embracing DevOps.

11.

12.

13.

14.

15.

16. DevOps: someone having your back.

17.

18.

19. Dogma

20. In the Beginning...

21. Deployment: the early days Get a few people together in slack/irc/etc. Merge up the code Run the tests Poke at it in stage Cross your fingers

22.

23.

24. Things get slower... Tests take longer to run More hosts = longer downloads, bounces More developers = more eyeballs More features = more code

25. The Problem:

26. The Problem: Humans Are Fallible “…oh @$#&”

27.

28. The Problem, With Math Assume: Every change has a chance of success: 98% That means no test failures, no reverts, etc. Every deploy has a number of changes: n Any failure in the pipeline invalidates the deploy Let’s figure out the probability of a successful deployment: p

29. The Problem, With Math Only you p = 98% You and a friend p = .98 * .98 = 96% You and nine co-workers p = .98 * .98 * .98 * … * .98 = 82%

30. The Problem, With Math p = (.98)n

31. The Problem, With Math p = (.98)n exponential decay!

32.

33. This doesn’t scale! More developers = more changes More changes = longer deploys Longer deploys = less time to develop Less time to develop = slower to iterate

34. Mitigating Exponential Decay p = (.98)n

35. Mitigating Exponential Decay p = (.98)n

36.

37. Making it harder to screw up Write more tests Write better tests Get better code reviews Get better infrastructure Switch programming languages Use better tools

38. Just write better software and stop making mistakes!

39. PROBLEM SOLVED

40.

41. The Real World Testing builds confidence in our changes Testing does not protect you from failure Better tools, tests, and infrastructure can raise our success rates

42. Mitigating Exponential Decay p = (.98)n

43. Mitigating Exponential Decay p = (.98)n

44. Service-Oriented Architecture Large monolith → smaller services Services communicate over network Usually HTTP, but you can do RPC, SOAP, etc. Service = independent code base Independent deployments

45. Service-Oriented Architecture Benefits Smaller code bases = upper bound to n Failure domains become isolated Technology independence Federated responsibility

46. Service-Oriented Architecture Drawbacks everything becomes decoupled function calls start looking like HTTP requests versioning can be a nightmare tracking dependencies is hard data consistency becomes challenging end-to-end testing becomes hard(er), if not impossible

47. SOA scales people, not code.

48. Conquering SOA With the monolith, it’s easy to focus on mean time between failures (MTBF)

49. Conquering SOA In a SOA, focus on mean time to recovery (MTTR)

50. Conquering SOA Fail fast Anticipate failure Leverage iteration speed to recover fast

51. Conquering SOA Treat everything as distributed Pick a size that works for you micro macro somewhere in between Size doesn’t have to be uniform!

52. PROBLEM SOLVED

53. Spreading the Love

54.

55.

56.

57. The Problem: Humans Specialize

58. Ops Deputies Developers ‘deputized’ to do operations Elevated privileges Tackle infrastructure needs for their team Contribute improvements to shared infra Become first-hop for operations questions

59.

60.

61. DEVOPS

62.

63.

64. Reinventing the Shuttle

65. vs.

66.

67. Build Buy

68. Build Buy

69. Composition

70. You

71. You ? Your new thing

72. You You fill in these bits

73. PaaSTA

74. =

75. Glue Define interfaces between components Makes it easy to swap out later Expect change Minimize code, keep it simple Think about EOL when you start Your company is changing, and so will your needs

76. Parting Words

77. Embrace DevOps DevOps culture, not just the technology!

78. Do what’s right for you Don’t let dogma rule! Tailor your approaches to the talent around you.

79. Plan for change Bit rot is real. Plan how you’re going to deal with it!

80. @YelpEngineering YelpEngineers engineeringblog.yelp.com github.com/yelp

81. yelp.com/careers

82. THANK YOU

Editor's Notes

this is what the speaker notes will look like
4.5 years at Yelp, 80 people -> hundreds Just going to talk about what I’ve learned along that way
For this talk to make sense, we have to also talk about what Yelp is. Connect people w/ great local businesses
Approx. 83 million UMVs via mobile More than 83 million reviews contributed since inception Approx. 68% of all searches on Yelp came from mobile (mobile web & app) Yelp is present across 32 countries
I’m here to talk about growth, and specifically growth in people
You might be growing your budding startup (I should warn you that I’m from the internet. This deck is mostly silly images.)
It might be a merger or acquisition
Maybe you’re expanding into new markets
More developers = more improvements, features, products
DevOps = cultural appreciation of change, not fear
Infra engineer = agent of flexibility DevOps provides tools, flexibility how we manage, define infra
infrastructure is what allows our software, products to exist Infra provides flexibility to software it supports
that infrastructure also scales people!
and that infrastructure can be just as hard to change as your products, if not harder
This is where DevOps changes from convenient to critical Embrace fundamental culture shift that spawned it
DevOps is a collaborative philosophy at its core I like to think about DevOps as being about someone having your back. Allies just as much as collaborators. This is absolutely critical if you want to survive growth.
The world is a messy place. The right answer isn’t always clear There’s always a dozen counter-examples to any decent-sounding rule
Teams are made of very different kinds of people Every combination is unique and has its own chemistry
In a growing company, Dogma kills. Only real advice I have You could say I have a dogma against dogmas. Dogma is anti-devops
This is how most projects, companies, etc. start: single code repo, maybe a server or two, and one or a couple of developers The entire project is easy to conceptualize. It fits in your head. This is true for things like configuration management, too! Small number of people makes overhead easy to cope with
And then ship it! Dump the code into production, probably restart everything. Click around, make sure stuff looks good. Maybe you’ve even got error monitoring! This works for a while, and it’s all you need when you get started.
But then time passes, and the monolith grows. You add features. You add developers to make those features.
As you add code and scale out your org + infrastructure, things naturally take longer. What was once a 10 minute deploy process might take closer to 30 minutes… or 45 minutes… or maybe even an hour! ...but that’s not a big deal, right?
And here we run into a problem.
HUMANS SCREW UP
As you grow, you’re doing more stuff. More code, features, tests, changes More stuff means more chances to screw up, which we do, because we are humans. And when you screw up, it means back to square one… new build, new test run, new deployment. ...and everyone has lost as much time as it takes to get this far. And they’ll have to invest it all again to get their code out!
This starts looking pretty grim even around 10 branches, and that’s assuming a (generous) 98% success rate! At 20 branches we’re below 70%!
So… how can we do better?
Well, we can try to improve this number...
Make it harder to screw up! Decrease the chances of failure. This is where almost all teams start focusing their efforts first.
Here are some common ways people to try to make screwing up more difficult.
It’s easy!
This doesn’t work in the real world. At the end of the day, we’re still human We’re people! We make mistakes! We just spent a long time talking about how we’re fallible. Why would this be any less true of the systems we create to prevent us from making mistakes?
In reality, doing all those things does help
But at the end of the day, you need more to scale an org. We want the asymptotic solutions, not the constant factor.
...and of course, as computing professionals, you’ve all probably been writhing in your seats, trying to tell me to do this first
We tackle this asymptotic factor with SOA. Split up large code bases into smaller ones that can be developed independently and communicate over common interfaces. How you size this is up to you. Don’t fall prey to the hype of “microservices” if it doesn’t make sense for you.
It is a lot harder to do SOA than a monolith, and it can decrease your rate of success dramatically! It takes a lot of effort and discipline to get it right. However, it’s very difficult to obtain the advantages it provides any other way.
Embrace the idea that failures will happen, and be ready for them! In a world like this, you need to safeguard your deployment process. It’s a problem if it gets slow, because it’s your out when you screw up.
In an SOA, sprawl easy. Technology diversity Infrastructure no longer becomes big, common, shared. Teams tailor to their uses and needs
trying to support this all with a single team doesn’t work before you know it, the infra is chasing you Operator:developer ratio needed to support this increases *dramatically*
you need to share the responsibility! this also means sharing the authority for these systems
Not everyone cares about infrastructure! ...and that’s just fine! But there will always be some people who are interested in what happens behind the scenes. Leverage them!
Programs like this are tunable, depending on your needs and your size. You could opt to only have a few
Or you could give it to absolutely everyone! It just depends on how your company is structured, the nature of your products, and how specialized you need parts of your team to be Just get the people who are interested interested!
This is one of our biggest manifestations of DevOps. We empower people to have our back, and in turn we have theirs.
If you don’t have a structure like this, and you’re a growing company, you’re headed towards having this much bandwidth between you and the rest of the organization. And that org is changing!
Having a program like Ops Deputies bought us high-bandwidth transit to the entire organization. We can use this to promote values we traditionally care about, while also staying in touch with planned change. This is the fabric of collaboration. This is also how you minimize MTTR.
Reinventing the space shuttle
Build vs. Buy Classic tradeoff Often examined as a binary choice: you’ll pick one or the other
But, like we talked about before: the world is a messy, ambiguous place
In reality, build vs. buy is more of a spectrum
...and you have a lot of freedom to move around in that spectrum as it makes sense to!
We can navigate this spectrum using composition. This is kind of the same as the Unix philosophy: tools and systems should do one thing well. Build a solution out of that.
Thinking about build vs. buy as binary fundamentally fails to acknowledge how diverse the world is
This component shouldn’t be a big box that requries you to conform to it Either you buy software that doesn’t quite fit, or you re-invent wheels
Instead, if you piece together the functionality you need from software that does it well, all you have to fill in is the glue Nobody else can provide you! You know your organization! Much easier maintenance than writing it yourself. Puzzle, contour of org will changes
We love this pattern at Yelp. How we’ve approached big system design in the last few years, worked out well. Our emerging flagship of this is PaaSTA: Platform as a Service, Totally Awesome!

Humans by the hundred (DevOps Days Ohio)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Humans by the hundred (DevOps Days Ohio)

Similar to Humans by the hundred (DevOps Days Ohio) (20)

More from Yelp Engineering

More from Yelp Engineering (13)

Recently uploaded

Recently uploaded (20)

Humans by the hundred (DevOps Days Ohio)

Editor's Notes