Presented at ALM Summit 3 in Redmond, WA. January 2013.
Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.
http://www.etsy.com/careers
10+ Deploys Per Day: Dev and Ops Cooperation at FlickrJohn Allspaw
Communications and cooperation between development and operations isn't optional, it's mandatory. Flickr takes the idea of "release early, release often" to an extreme - on a normal day there are 10 full deployments of the site to our servers. This session discusses why this rate of change works so well, and the culture and technology needed to make it possible.
DevOps practices like continuous testing and shifting testing left in the development lifecycle can improve software quality and reduce issues. Continuous testing involves testing early and often at every stage using test automation. This provides rapid feedback to prevent defects. Tools like Puppet, Ansible, and Docker can automate testing, configuration, and deployment to support continuous integration and delivery.
Manual testing interview questions by infotech suhasreddy1
The document provides information about manual software testing practices including definitions of priority and severity for defects, examples of high severity low priority defects, bases for test case review, contents of requirements documents, differences between web application and client server testing, examples of defect reporting, bug lifecycles, and approaches to regression testing. Key details covered include assigning priority by developers and severity by testers, focusing regression testing on modules impacted by fixes, and updating test cases based on changes to functionality or code.
The presentation about the fundamentals of DevOps workflow and CI/CD practices I presented at Centroida (https://centroida.ai/) as a back-end development intern.
I gave this presentation on 5/17 to the New Mexico VMUG in Santa Fe. The presentation provides an overview of OpenStack, what it is (and isn't), and some things you might learn to get started with OpenStack.
L'ORM Doctrine offre beaucoup plus de flexibilité qu'il n'y paraît. Dans cette présentation, nous allons nous intéresser à son fonctionnement interne et à ses fonctionnalités moins connues, pour découvrir comment mieux l'utiliser. Au programme, évènements et listeners, filtres, tracking policy, mais aussi des astuces sur des architectures possibles pour son code...
10+ Deploys Per Day: Dev and Ops Cooperation at FlickrJohn Allspaw
Communications and cooperation between development and operations isn't optional, it's mandatory. Flickr takes the idea of "release early, release often" to an extreme - on a normal day there are 10 full deployments of the site to our servers. This session discusses why this rate of change works so well, and the culture and technology needed to make it possible.
DevOps practices like continuous testing and shifting testing left in the development lifecycle can improve software quality and reduce issues. Continuous testing involves testing early and often at every stage using test automation. This provides rapid feedback to prevent defects. Tools like Puppet, Ansible, and Docker can automate testing, configuration, and deployment to support continuous integration and delivery.
Manual testing interview questions by infotech suhasreddy1
The document provides information about manual software testing practices including definitions of priority and severity for defects, examples of high severity low priority defects, bases for test case review, contents of requirements documents, differences between web application and client server testing, examples of defect reporting, bug lifecycles, and approaches to regression testing. Key details covered include assigning priority by developers and severity by testers, focusing regression testing on modules impacted by fixes, and updating test cases based on changes to functionality or code.
The presentation about the fundamentals of DevOps workflow and CI/CD practices I presented at Centroida (https://centroida.ai/) as a back-end development intern.
I gave this presentation on 5/17 to the New Mexico VMUG in Santa Fe. The presentation provides an overview of OpenStack, what it is (and isn't), and some things you might learn to get started with OpenStack.
L'ORM Doctrine offre beaucoup plus de flexibilité qu'il n'y paraît. Dans cette présentation, nous allons nous intéresser à son fonctionnement interne et à ses fonctionnalités moins connues, pour découvrir comment mieux l'utiliser. Au programme, évènements et listeners, filtres, tracking policy, mais aussi des astuces sur des architectures possibles pour son code...
Test cases are used to systematically test software and verify requirements. A test case contains a set of steps, expected results, and actual results. It has a name, description, prerequisites, and test data. Each test case contains multiple test steps that verify a discrete action. Best practices for writing test cases include avoiding jargon, writing steps independently, and focusing on positive scenarios. Test cases are organized into templates with required fields and naming conventions to facilitate management in testing tools.
This document discusses testing RESTful web services using REST Assured. It provides an overview of REST and HTTP methods like GET, POST, PUT, DELETE. It explains why API automation is required for early defect detection, contract validation, stopping builds on failure. REST Assured allows testing and validating REST services in Java and integrates with frameworks like JUnit and TestNG. It provides methods to format HTTP requests, send requests, validate status codes and response data. REST Assured also handles authentication mechanisms. The document provides instructions on adding the REST Assured Maven dependency and writing tests, including an example of a GET request.
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...DevOpsDays Tel Aviv
This document discusses best practices for site reliability engineering (SRE). It recommends hiring only coders, establishing service level agreements (SLAs) and measuring performance against them. It also suggests using error budgets, maintaining a common staffing pool for SRE and development teams, ensuring on-call teams have at least 8 people, and conducting post-mortems after every incident. Key reliability metrics like availability, latency, throughput and quality are identified. Objectives, service level objectives (SLOs) and responses if the error budget is exceeded or exhausted are outlined.
The document discusses test management for software quality assurance, including defining test management as organizing and controlling the testing process and artifacts. It covers the phases of test management like planning, authoring, execution, and reporting. Additionally, it discusses challenges in test management, priorities and classifications for testing, and the role and responsibilities of the test manager.
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
Talk from Kafka Summit San Francisco 2019 (https://kafka-summit.org/sessions/event-driven-model-serving-stream-processing-vs-rpc-kafka-tensorflow/). Video recording will be available for free on the Summit website.
Event-based stream processing is a modern paradigm to continuously process incoming data feeds, e.g. for IoT sensor analytics, payment and fraud detection, or logistics. Machine Learning / Deep Learning models can be leveraged in different ways to do predictions and improve the business processes. Either analytic models are deployed natively in the application or they are hosted in a remote model server. In the latter you combine stream processing with RPC / Request-Response paradigm instead of direct doing direct inference within the application. This talk discusses the pros and cons of both approaches and shows examples of stream processing vs. RPC model serving using Kubernetes, Apache Kafka, Kafka Streams, gRPC and TensorFlow Serving. The trade-offs of using a public cloud service like AWS or GCP for model deployment are also discussed and compared to local hosting for offline predictions directly “at the edge”.
Key takeaways
• Machine Learning / Deep Learning models can be used in different ways to do predictions. Scalability and loose coupling are important success factors
• Stream processing vs. RPC / Request-Response for model serving has many trade-offs – learn about alternatives and best practices for your different scenarios
• Understand the alternatives and trade-offs of model deployment in modern infrastructures like Kubernetes or Cloud Services like AWS or GCP
• See live demos with Java, gRPC, Apache Kafka, KSQL and TensorFlow Serving to understand the trade-offs
Postman is an API development tool that allows users to design, manage, run, test, document, and share APIs. It provides features like request building, documentation, environments, test automation, and collaboration. Alternatives include Paw, Insomnia, command line tools like cURL, and services from Apigee and Apiary. The document recommends using any tool that helps share APIs, especially for complex projects and team collaboration.
Performance Engineering Case Study V1.0sambitgarnaik
This document discusses performance testing solutions and services offered by IonIdea. It provides an overview of IonIdea's performance testing tools for load testing, performance testing, and monitoring application and infrastructure performance. It also describes IonIdea's testing services such as performance testing, test automation consulting, and outsourced testing. Finally, it presents a case study example of how IonIdea used performance triage techniques including profiling and load testing to identify and address performance issues for an online banking application.
In this session, we will learn about Teamcity CI Server. We will look at the different options available and how we can set a CI pipeline using Teamcity.
Behavior Driven development is the process of exploring, discovering, defining and driving the desired behavior of software system by using conversation, concrete examples and automated tests.
Load Testing Best Practices: Application complexity is increasing, yet the stringent requirements for web performance is increasing exponentially. Learn more about the three major types of load testing, determine which you need and how to conduct them.
OpenFaaS is a serverless framework that allows users to build and run applications without managing servers. It uses Docker and Kubernetes to package any process as a serverless function. Some key advantages of OpenFaaS include only paying for necessary resources, not needing to manage servers, and automatic scaling of application resources. Functions as a Service (FaaS) offers the ability to create single use functions with complete server abstraction for developers. The presentation then demonstrates example functions and the project structure used by OpenFaaS.
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdKai Wähner
Microservice architectures are not free lunch! Microservices need to be decoupled, flexible, operationally transparent, data aware and elastic. Most material from last years only discusses point-to-point architectures with inflexible and non-scalable technologies like REST / HTTP. This video takes a look at cutting edge technologies like Apache Kafka, Kubernetes, Envoy, Linkerd and Istio to implement a cloud-native service mesh to solve these challenges and bring microservices to the next level of scale, speed and efficiency.
Key takeaways:
- Apache Kafka decouples services, including event streams and request-response
- Kubernetes provides a cloud-native infrastructure for the Kafka ecosystem
- Service Mesh helps with security and observability at ecosystem / organization scale
- Envoy and Istio sit in the layer above Kafka and are orthogonal to the goals Kafka addresses
Blog post: http://www.kai-waehner.de/blog/2019/09/24/cloud-native-apache-kafka-kubernetes-envoy-istio-linkerd-service-mesh
Video recording of this slide deck: https://youtu.be/Us_C4RFOUrA
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
The document discusses Toast's adoption and use of Apache Pulsar for asynchronous messaging in their microservices architecture. It describes how they built a "Pulsar Toggle" leveraging Envoy proxy to enable blue/green deployments of Pulsar consumers. The Pulsar Toggle allows consumers to be paused and resumed based on their status in the Envoy control plane, improving the reliability and usability of deploying changes to Pulsar-based services. Toast has seen increased adoption of Pulsar and benefits from its stability and scalability.
Since its first 1.12 release on July 2016, Docker Swarm Mode has matured enough as a clustering and scheduling tool for IT administrators and developers who can easily establish and manage a cluster of Docker nodes as a single virtual system. Swarm mode integrates the orchestration capabilities of Docker Swarm into Docker Engine itself and help administrators and developers with the ability to add or subtract container iterations as computing demands change. With sophisticated but easy to implement features like built-in Service Discovery, Routing Mesh, Secrets, declarative service model, scaling of the services, desired state reconciliation, scheduling, filters, multi-host networking model, Load-Balancing, rolling updates etc. Docker 17.06 is all set for production-ready product today. Join me webinar organised by Docker Izmir, to get familiar with the current Swarm Mode capabilities & functionalities across the heterogeneous environments.
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.
In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.
API Testing: The heart of functional testing" with Bj RollisonTEST Huddle
View webinar: http://www.eurostarconferences.com/community/member/webinar-archive/webinar-81-api-testing-the-heart-of-functional-testing
An API, or Application Programming Interface, is a collection of functions that provide much of the functional capabilities in complex software systems. Most customers are accustomed to interacting with a graphical user interface on the computer. But, many customers do not realize the much of the functionality of a program comes from APIs in the operating system or program's dynamic-link libraries (DLL). So, if the business logic or core functionality is exposed via an API call then and if we want to find functional bugs sooner than API testing may be an approach that provides additional value in your overall test strategy. Additionally, API testing can start even before the user interface is complete so functional capabilities can be tested while designers are hashing out the "look and feel." API testing will not replace testing through the user interface, but it can augment your test strategy and provide a solid foundation of automated tests that increase your confidence in the functional quality of your product.
The document discusses migrating a database table's column from storing percentage values in the range of 0 to 100 to storing decimal values in the range of 0 to 1, without downtime or changing the database schema. It proposes doing this by adding a new column to store the decimal values, populating it from the existing percentage column, then dropping the original column and modifying the new one to be the main value column. This allows applications to continue reading and writing as normal during the migration.
Continuous Deployment at Etsy: A Tale of Two ApproachesRoss Snyder
1. Etsy has transitioned from infrequent deployments that took weeks of work and often broke the site, to deploying up to 25 times per day with near effortless deploys.
2. By deploying frequently with small code changes and thorough testing, the probability and severity of degradations is reduced, allowing issues to be detected and resolved quickly.
3. Etsy's continuous deployment approach enables rapid experimentation and improvement through frequent analysis of deployment outcomes and re-examination of assumptions.
Test cases are used to systematically test software and verify requirements. A test case contains a set of steps, expected results, and actual results. It has a name, description, prerequisites, and test data. Each test case contains multiple test steps that verify a discrete action. Best practices for writing test cases include avoiding jargon, writing steps independently, and focusing on positive scenarios. Test cases are organized into templates with required fields and naming conventions to facilitate management in testing tools.
This document discusses testing RESTful web services using REST Assured. It provides an overview of REST and HTTP methods like GET, POST, PUT, DELETE. It explains why API automation is required for early defect detection, contract validation, stopping builds on failure. REST Assured allows testing and validating REST services in Java and integrates with frameworks like JUnit and TestNG. It provides methods to format HTTP requests, send requests, validate status codes and response data. REST Assured also handles authentication mechanisms. The document provides instructions on adding the REST Assured Maven dependency and writing tests, including an example of a GET request.
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...DevOpsDays Tel Aviv
This document discusses best practices for site reliability engineering (SRE). It recommends hiring only coders, establishing service level agreements (SLAs) and measuring performance against them. It also suggests using error budgets, maintaining a common staffing pool for SRE and development teams, ensuring on-call teams have at least 8 people, and conducting post-mortems after every incident. Key reliability metrics like availability, latency, throughput and quality are identified. Objectives, service level objectives (SLOs) and responses if the error budget is exceeded or exhausted are outlined.
The document discusses test management for software quality assurance, including defining test management as organizing and controlling the testing process and artifacts. It covers the phases of test management like planning, authoring, execution, and reporting. Additionally, it discusses challenges in test management, priorities and classifications for testing, and the role and responsibilities of the test manager.
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
Talk from Kafka Summit San Francisco 2019 (https://kafka-summit.org/sessions/event-driven-model-serving-stream-processing-vs-rpc-kafka-tensorflow/). Video recording will be available for free on the Summit website.
Event-based stream processing is a modern paradigm to continuously process incoming data feeds, e.g. for IoT sensor analytics, payment and fraud detection, or logistics. Machine Learning / Deep Learning models can be leveraged in different ways to do predictions and improve the business processes. Either analytic models are deployed natively in the application or they are hosted in a remote model server. In the latter you combine stream processing with RPC / Request-Response paradigm instead of direct doing direct inference within the application. This talk discusses the pros and cons of both approaches and shows examples of stream processing vs. RPC model serving using Kubernetes, Apache Kafka, Kafka Streams, gRPC and TensorFlow Serving. The trade-offs of using a public cloud service like AWS or GCP for model deployment are also discussed and compared to local hosting for offline predictions directly “at the edge”.
Key takeaways
• Machine Learning / Deep Learning models can be used in different ways to do predictions. Scalability and loose coupling are important success factors
• Stream processing vs. RPC / Request-Response for model serving has many trade-offs – learn about alternatives and best practices for your different scenarios
• Understand the alternatives and trade-offs of model deployment in modern infrastructures like Kubernetes or Cloud Services like AWS or GCP
• See live demos with Java, gRPC, Apache Kafka, KSQL and TensorFlow Serving to understand the trade-offs
Postman is an API development tool that allows users to design, manage, run, test, document, and share APIs. It provides features like request building, documentation, environments, test automation, and collaboration. Alternatives include Paw, Insomnia, command line tools like cURL, and services from Apigee and Apiary. The document recommends using any tool that helps share APIs, especially for complex projects and team collaboration.
Performance Engineering Case Study V1.0sambitgarnaik
This document discusses performance testing solutions and services offered by IonIdea. It provides an overview of IonIdea's performance testing tools for load testing, performance testing, and monitoring application and infrastructure performance. It also describes IonIdea's testing services such as performance testing, test automation consulting, and outsourced testing. Finally, it presents a case study example of how IonIdea used performance triage techniques including profiling and load testing to identify and address performance issues for an online banking application.
In this session, we will learn about Teamcity CI Server. We will look at the different options available and how we can set a CI pipeline using Teamcity.
Behavior Driven development is the process of exploring, discovering, defining and driving the desired behavior of software system by using conversation, concrete examples and automated tests.
Load Testing Best Practices: Application complexity is increasing, yet the stringent requirements for web performance is increasing exponentially. Learn more about the three major types of load testing, determine which you need and how to conduct them.
OpenFaaS is a serverless framework that allows users to build and run applications without managing servers. It uses Docker and Kubernetes to package any process as a serverless function. Some key advantages of OpenFaaS include only paying for necessary resources, not needing to manage servers, and automatic scaling of application resources. Functions as a Service (FaaS) offers the ability to create single use functions with complete server abstraction for developers. The presentation then demonstrates example functions and the project structure used by OpenFaaS.
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdKai Wähner
Microservice architectures are not free lunch! Microservices need to be decoupled, flexible, operationally transparent, data aware and elastic. Most material from last years only discusses point-to-point architectures with inflexible and non-scalable technologies like REST / HTTP. This video takes a look at cutting edge technologies like Apache Kafka, Kubernetes, Envoy, Linkerd and Istio to implement a cloud-native service mesh to solve these challenges and bring microservices to the next level of scale, speed and efficiency.
Key takeaways:
- Apache Kafka decouples services, including event streams and request-response
- Kubernetes provides a cloud-native infrastructure for the Kafka ecosystem
- Service Mesh helps with security and observability at ecosystem / organization scale
- Envoy and Istio sit in the layer above Kafka and are orthogonal to the goals Kafka addresses
Blog post: http://www.kai-waehner.de/blog/2019/09/24/cloud-native-apache-kafka-kubernetes-envoy-istio-linkerd-service-mesh
Video recording of this slide deck: https://youtu.be/Us_C4RFOUrA
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
The document discusses Toast's adoption and use of Apache Pulsar for asynchronous messaging in their microservices architecture. It describes how they built a "Pulsar Toggle" leveraging Envoy proxy to enable blue/green deployments of Pulsar consumers. The Pulsar Toggle allows consumers to be paused and resumed based on their status in the Envoy control plane, improving the reliability and usability of deploying changes to Pulsar-based services. Toast has seen increased adoption of Pulsar and benefits from its stability and scalability.
Since its first 1.12 release on July 2016, Docker Swarm Mode has matured enough as a clustering and scheduling tool for IT administrators and developers who can easily establish and manage a cluster of Docker nodes as a single virtual system. Swarm mode integrates the orchestration capabilities of Docker Swarm into Docker Engine itself and help administrators and developers with the ability to add or subtract container iterations as computing demands change. With sophisticated but easy to implement features like built-in Service Discovery, Routing Mesh, Secrets, declarative service model, scaling of the services, desired state reconciliation, scheduling, filters, multi-host networking model, Load-Balancing, rolling updates etc. Docker 17.06 is all set for production-ready product today. Join me webinar organised by Docker Izmir, to get familiar with the current Swarm Mode capabilities & functionalities across the heterogeneous environments.
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.
In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.
API Testing: The heart of functional testing" with Bj RollisonTEST Huddle
View webinar: http://www.eurostarconferences.com/community/member/webinar-archive/webinar-81-api-testing-the-heart-of-functional-testing
An API, or Application Programming Interface, is a collection of functions that provide much of the functional capabilities in complex software systems. Most customers are accustomed to interacting with a graphical user interface on the computer. But, many customers do not realize the much of the functionality of a program comes from APIs in the operating system or program's dynamic-link libraries (DLL). So, if the business logic or core functionality is exposed via an API call then and if we want to find functional bugs sooner than API testing may be an approach that provides additional value in your overall test strategy. Additionally, API testing can start even before the user interface is complete so functional capabilities can be tested while designers are hashing out the "look and feel." API testing will not replace testing through the user interface, but it can augment your test strategy and provide a solid foundation of automated tests that increase your confidence in the functional quality of your product.
The document discusses migrating a database table's column from storing percentage values in the range of 0 to 100 to storing decimal values in the range of 0 to 1, without downtime or changing the database schema. It proposes doing this by adding a new column to store the decimal values, populating it from the existing percentage column, then dropping the original column and modifying the new one to be the main value column. This allows applications to continue reading and writing as normal during the migration.
Continuous Deployment at Etsy: A Tale of Two ApproachesRoss Snyder
1. Etsy has transitioned from infrequent deployments that took weeks of work and often broke the site, to deploying up to 25 times per day with near effortless deploys.
2. By deploying frequently with small code changes and thorough testing, the probability and severity of degradations is reduced, allowing issues to be detected and resolved quickly.
3. Etsy's continuous deployment approach enables rapid experimentation and improvement through frequent analysis of deployment outcomes and re-examination of assumptions.
Principles and Practices in Continuous Deployment at EtsyMike Brittain
This document discusses principles and practices of continuous deployment at Etsy. It describes how Etsy moved from deploying code changes every 2-3 weeks with stressful release processes, to deploying over 30 times per day. The key principles that enabled this are innovating continuously, resolving scaling issues quickly, minimizing recovery time from failures, and prioritizing employee well-being over stressful releases. Automated testing, deployment to staging environments, dark launches, and extensive monitoring allow for frequent, low-risk deployments to production.
Databases create a real challenge for automation and dealing with database deployments is a complex process. Databases contain our most valuable information, business data, which must be preserved and protected at all costs and yet the automation processes for database deployment are not widely adopted.
I'm going to cover something which could be seen as essential for Cassandra but which hasn't gotten much attention in the Cassandra community and literature. It's schema migrations--how you go about pushing out and versioning changes to your keyspace and table definitions across environments. This is an area that has established solutions in the relational database world, with tools like Liquibase(http://www.liquibase.org/) and Flyway (http://flywaydb.org/) and in web frameworks like Rails and Grails.
I'll explain the different types of migrations but then focus, for most of the talk, on schema migrations. I'll explain how schema migrations have been done in the Cassandra community and the roadblocks teams have faced trying to use Liquibase and Flyway to manage Cassandra migrations.
Then I'll share an elegant, lightweight schema migrations system that we at GridPoint built on top of Flyway. I'll use our system as a context for discussing schema migration best practices for Cassandra and the various choices teams have for their migrations and table definitions, including when NOT to use a tool like Flyway. I'll also touch on the other types of migrations besides keyspace and table definitions that can be versioned and driven off source control.
How to Get to Second Base with Your CDNMike Brittain
Tips on how to improve how you use your CDN. Condensed from a lot of material, this talk was crammed into 20 minutes.
More info available at http://mikebrittain.com.
Details on how we capture application data in our access and error logs, as well as how to generate quick reports and graphs from these logs.
This talk was presented at O'Reilly's Velocity Online Conference on October 26, 2011.
Michael Kjellman, Software Engineer at Barracuda Networks, has offered to present on his experiences with Apache Cassandra.
Come learn about:
• Continuous Deployments with Cassandra
• Upgrading Cassandra
• When Upgrades Go Wrong
• Coding Complexity Moved to Operations (How to Prepare and Plan)
• Why 'Apt-get/Yum Install Cassandra' is a bad idea
• Why You Should Treat Cassandra’s Code like it's Your Own
Advanced Topics in Continuous DeploymentMike Brittain
Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.
http://www.etsy.com/careers
The document discusses how exponential growth and decay models can be applied to many real-world phenomena, from the spread of bacteria and fungus to the growth of social networks and e-commerce companies. It provides examples of how concepts like doubling time, half-life, word of mouth effects, and time delays can exponentially impact various systems over time if not managed properly. The key message is that exponential behavior is more common and influential than often realized.
How do you continue to ship 50 times a day, when you're constantly hiring more engineers? How can you continue, when every day you write more tests that need to be run on every commit? This talk will cover how to scale up Continuous Integration and Continuous Deployment infrastracture, for teams as small as a handful of engineers and as large as hundreds of engineers.
This document summarizes a workshop about DevOps practices like continuous integration, infrastructure as code, and automated deployment. It discusses:
1. Using continuous integration to automatically build and test code changes.
2. Defining infrastructure using configuration files to enable consistent and automated environments from development to production.
3. Automating deployment through tools that rebuild servers from code and data to enable rapid, repeatable releases with minimal downtime.
The document provides examples using tools like Puppet, AWS, and CloudFormation to demonstrate these practices on a sample Twitter application. It emphasizes how automation enables faster delivery of features while maintaining production stability.
Metrics-driven engineering is practiced at Etsy. [1] Engineers build applications and also manage monitoring tools like Graphite and Ganglia to track metrics and visualize logs and events. [2] Over 16,000 metrics are tracked in Graphite along with logs to provide visibility into application health and correlate it with deployments and other events. [3] Dashboards are used to mix and match metrics and provide a high-level view of site performance and validation.
This document discusses metrics-driven engineering practices at Etsy including collecting and visualizing business, application, and system metrics to gain visibility and make data-driven decisions. Key points include using tools like Ganglia, Graphite, Splunk, Logster, and StatsD to monitor metrics on clusters, applications, logs, and more. The metrics provide insights on site traffic, feature usage, server health, code deployments, and errors to help optimize performance, detect and address issues, and plan infrastructure needs.
Web Performance Culture and Tools at EtsyMike Brittain
Mike Brittain presented on web performance culture and tools at Etsy. He discussed how Etsy focuses on performance to improve business metrics like conversion rates and page views. Engineers use tools like logging, Logster, Graphite, StatsD, and custom dashboards to measure performance. They have processes for continuous deployment, data-driven development and prioritizing optimizations. The tools and focus on measurement help Etsy engineers improve site stability and user experience.
This talk presents a comprehensive analysis of TLS in the SMTP world. We scanned over 20 million unique email recipient domains and analyzed TLS (X.509) certificates to measure overall STARTTLS deployment quality. We discovered a wealth of information that was previously unknown. The analysis will provide a good baseline in terms of STARTTLS and TLS certificates used in SMTP.
Scan tool: https://prbinu.github.io/tls-scan
The Hard Problems of Continuous DeploymentTimothy Fitz
This document discusses the challenges of continuous deployment (CD) for data, mobile, and scale. For data, CD is made difficult by slow data updates/movement and schemas living outside code. Solutions include applying only cheap changes, applying changes to standby databases, and blue/green deploys. For mobile, users must opt-in to updates and app store submissions take time, but tools exist to help. For scale, availability, performance, and developer happiness must be maintained as user counts and tests increase. Techniques include fast tests, parallel testing, hardware scaling, and defining CD roles and processes.
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec ProgramMatt Tesauro
This document discusses how to incorporate Agile, DevOps, and CI/CD principles into an application security (AppSec) program through the use of AppSec pipelines. It describes how Pearson created an AppSec pipeline to help optimize their AppSec team's resources, drive consistency, increase visibility, and reduce friction between development and security teams. The document advocates experimenting with AppSec pipelines to continuously improve processes through techniques like integrating Docker containers and writing security tests.
Infrastructure Continuous Delivery using CloudFormationjoehack3r
How we continually update our CloudFormation stacks utilizing GitHub, Jenkins, and a custom Python script. This allows us to follow the practice of treating infrastructure as code and continuous delivery.
The practical implementation of Continuous Delivery at Etsy, and how it enables the engineering team to build features quickly, refactor and change architecture, and respond to problems in production.
Presented at GOTO Aarhus 2012.
Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.
http://www.etsy.com/careers
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...Amazon Web Services
Some of the best businesses today are deploying their code dozens of times a day. How? By making heavy use of automation, smart tools, and repeatable patterns to get process out of the way and keep the workflow moving. Come to this session to learn how you can do this too, using services such as AWS OpsWorks, AWS CloudFormation, Amazon Simple Workflow Service, and other tools. We'll discuss a number of different deployment patterns, and what aspects you need to focus on when working toward deployment automation yourself.
RightScale Webinar: January 13, 2011 – Watch this webinar for a look behind the scenes as we discuss ServerTemplates and how are they different from alternate approaches.
Working Software Over Comprehensive DocumentationAndrii Dzynia
This document provides information on various tools that can be used for agile software development and testing. It discusses tools for user stories, project planning, documentation, testing, reports, and session-based test management. Various options are presented for each category such as Excel, JIRA, Confluence, and specialized agile tools.
[RHFSeoul2017]6 Steps to Transform Enterprise ApplicationsDaniel Oh
The document provides a 6 step approach to transforming enterprise applications:
1. Re-organizing to DevOps;
2. Implementing self-service, on-demand infrastructure;
3. Automating deployments using tools like Puppet, Chef, and Kubernetes;
4. Establishing continuous integration and deployment pipelines;
5. Adopting advanced deployment techniques like blue-green deployments;
6. Moving to a microservices architecture.
This document provides best practices for developing applications in the cloud. It discusses recommendations such as limiting HTTP traffic by optimizing assets, using persistent storage instead of treating the file system as persistent, pushing state to clients or centralized services instead of relying on server-side state, automating deployments, and performing zero-downtime upgrades through techniques like blue-green deployments. The document also recommends avoiding vendor lock-in, separating environments, communicating asynchronously, and scaling applications dynamically based on metrics.
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...DevOpsDays Tel Aviv
One of the critical factors for development velocity is software correctness. Our ability to develop and ship new features fast is bounded by our ability to validate several aspects of the change: * Does the feature meet the requirements? * How does the feature affect existing code, and how can it affect the production environment? With continues codebase growth and new features being added, naturally our productivity decreases, and our need to improve the guarantees for quality and correctness increase.
In this talk, I’ll focus on testing environments: why developers need a self-serve platform to create a full functioning environment on-demand, how such environments should be managed, and how can one restore part of the lost velocity. I’ll cover an internal system we use at AppsFlyer called ‘Namespaces’ that addresses the issue with the help of Mesos / Marathon, Docker, Traefik, and Consul.
This document discusses various topics related to developing web apps, including HTML5, responsive design, touch events, offline capabilities, and debugging tools. It provides links to resources on HTML5 features like media queries, SVG, web workers, and the page visibility API. It also covers techniques for adapting content like responsive web design, progressive enhancement, and server-side adaptation. Mobile browser stats and popular devices on Douban are mentioned. Frameworks like Bootstrap and tools like Weinre for debugging mobile apps are referenced.
This document discusses various techniques for making web applications work offline and with unreliable network connections, including:
- The application cache manifest which allows specifying cached resources to work offline
- Issues with the current manifest specification and potential enhancements
- The window.applicationCache API for caching resources and monitoring cache status
- Detecting online/offline status using the navigator.onLine property
In 3 sentences or less, it summarizes approaches for offline web applications using the application cache manifest, applicationCache API, and navigator.onLine property.
AD113 Speed Up Your Applications w/ Nginx and PageSpeededm00se
My slide deck from my session, AD113: Speed Up Your Applications with Nginx + PageSpeed, at MWLUG 2015 in Atlanta, GA at the Ritz-Carlton.
For more, see:
- https://edm00se.io/self-promotion/mwlug-ad113-success
- https://github.com/edm00se/AD113-Speed-Up-Your-Apps-with-Nginx-and-PageSpeed
The Ember.js Framework - Everything You Need To KnowAll Things Open
All Things Open 2014 - Day 2
Thursday, October 23rd, 2014
Yehuda Katz
Founder of Tilde
Front Dev 1
The Ember.js Framework - Everything You Need To Know
Just In Time Scalability Agile Methods To Support Massive Growth PresentationTimothy Fitz
Eric Reis and Chris Hondl's MySQL conference presentation on Just In Time Scalability. http://startuplessonslearned.blogspot.com/2008/09/just-in-time-scalability.html
Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries
The document discusses techniques for scaling web applications to support massive growth using agile methods. It describes IMVU's architecture transformation from a small site built on open source tools to a large scaled architecture. Key techniques discussed include continuous integration and deployment, incremental changes, monitoring metrics, caching, sharding databases, and evolving data designs without downtime.
How to measure everything - a million metrics per second with minimal develop...Jos Boumans
Krux is an infrastructure provider for many of the websites you
use online today, like NYTimes.com, WSJ.com, Wikia and NBCU. For
every request on those properties, Krux will get one or more as
well. We grew from zero traffic to several billion requests per
day in the span of 2 years, and we did so exclusively in AWS.
To make the right decisions in such a volatile environment, we
knew that data is everything; without it, you can't possibly make
informed decisions. However, collecting it efficiently, at scale,
at minimal cost and without burdening developers is a tremendous
challenge.
Join me in this session to learn how we overcame this challenge
at Krux; I will share with you the details of how we set up our
global infrastructure, entirely managed by Puppet, to capture over
a million data points every second on virtually every part of the
system, including inside the web server, user apps and Puppet itself,
for under $2000/month using off the shelf Open Source software and
some code we've released as Open Source ourselves. In addition, I’ll
show you how you can take (a subset of) these metrics and send them
to advanced analytics and alerting tools like Circonus or Zabbix.
This content will be applicable for anyone collecting or desiring to
collect vast amounts of metrics in a cloud or datacenter setting and
making sense of them.
Real-World Pulsar Architectural PatternsDevin Bost
This presentation covers Real-World Pulsar Architectural Patterns involving Distributed Caching and Distributed Tracing. We also cover the use of Apache Ignite, Jaeger, Apache Flink, and many other technologies, as well as industry best-practices.
Everything is Awesome - Cutting the Corners off the WebJames Rakich
The web is awesome despite it's detractors. But we can't forget our fundamentals when we're trying to forge ahead with new tech. This talk is about how to approach the building blocks of the web in a way that takes advantage of their strengths and avoids their weaknesses.
Do you need Ops in your new startup? If not now, then when? And...what is Ops?
Learn how to scale ruby-based distributed software infrastructure in the cloud to serve 4,000 requests per second, handle 400 updates per second, and achieve 99.97% uptime – all while building the product at the speed of light.
Unimpressed? Now try doing the above altogether without the Ops team, while growing your traffic 100x in 6 months and deploying 5-6 times a day!
It could be a dream, but luckily it's a reality that could be yours.
The document provides an overview of Google App Engine, a platform for developing and hosting web applications on Google's infrastructure. It discusses the different language runtimes, services, and development tools available on App Engine and highlights some example applications that have been built on the platform. The document also shares experiences from Latin American users and details some new features recently added to App Engine like cursors, task queues, and cron jobs.
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014Amazon Web Services
Log data contains some of the most valuable raw information you can gather and analyze about your infrastructure and applications. Amid the mess of confusing lines of seemingly random text can be hints about performance, security, flaws in code, user access patterns, and other operational data. Without the proper tools, finding insights in these logs can be like searching for a hay-colored needle in a haystack. In this session you learn what practices and patterns you can easily implement that can help you better understand your log files. You see how you can customize web logs to add more information to them, how to digest logs from around your infrastructure, and how to analyze your log files in near real time.
Similar to Continuous Deployment: The Dirty Details (20)
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
8. DECEMBER 2012
1.5 Billion page views
$117 Million of goods sold
6 Million items sold
Items by anjaysdesigns, betwixxt, OneStarLeatherGoods, mediumcontrol, TheDesignPallet http://www.etsy.com/blog/news/2013/etsy-statistics-december-2012-weather-report/
11. Continuous delivery is a pattern language in growing use
in software development to improve the process of
software delivery.
Techniques such as automated testing, continuous integration,
and continuous deployment allow software to be developed to a high
standard and easily packaged and deployed to test environments,
resulting in the ability to rapidly, reliably and repeatedly push out
enhancements and bug fixes to customers at low risk and with
minimal manual overhead.
~wikipedia
credit: Stewart, redgen (flickr)
14. Then Now
2009 2010-today
Just before we
started using CD
15. Then Now
6-14 hours 15 mins
“Deployment Army” 1 person
Special event and Part of everyday
highly orchestrated workflow
16. Then Now
Blocked for Blocked for
6-14 hours. 15 minutes.
6+ hours to 15 minutes to
redeploy redeploy
17. Then Now
Release branch, Mainline,
database schemas, minimal linking
data transforms, and building,
packaging, rsync,
rolling restarts, site up
cache purging,
scheduled downtime
33. What’s in a deploy?
Small incremental changes to the application
New classes, methods, controllers
Graphics, stylesheets, templates
Copy/content changes
Turning flags on, off, or % ramp up
34. Low MTTR (response times)
Latent bugs and security holes
Traffic management, load shedding
Adding and removing infrastructure
Tweaking config flags or releasing patches.
43. Our web application is largely monolithic.
Etsy.com, Support & Back-office tools,
Developer API, Gearman (async work)
44. Our web application is largely monolithic.
Etsy.com, Support & Back-office tools,
Developer API, Gearman (async work)
PHP, Apache, Memcache
45. External “services” are not deployed with
the main application.
e.g. Databases, Search, Photo storage, Payments
46. External “services” are not deployed with
the main application.
e.g. Databases, Search, Photo storage, Payments
MYSQL PCI
PROXY CACHE,
(schema changes) (controlled access)
FILERS, AMAZON S3
SOLR, JVM
(specialized infra.)
(rolling restarts)
47. For every config flag, there are two states
we can support — present and future.
48. For every config flag, there are two states
we can support — present and future.
... or past and present.
51. C
RULE OF THUMB:
Prefer ADDs over ALTERs (non-breaking expansion)
52. 1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
53. 0. Add new version to schema
1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
54. 0. Add new version to schema
Schema change to add prefs columns to “users” table.
“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “off”
“read_prefs_from_users_table” => “off”
55. 1. Write to both versions
Write code for writing prefs to the “users” table.
“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “off”
56. 2. Backfill historical data
Offline process to sync existing data from “user_prefs”
to new columns in “users”
57. 3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.
“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “staff”
58. 3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.
“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “1%”
59. 3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.
“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “5%”
60. 3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.
“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “on”
(“on” == “100%”)
61. 4. Cut-off writes to old version
After running on the new table for a significant amount
of time, we can cut off writes to the old table.
“write_prefs_to_user_prefs_table” => “off”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “on”
62. “Branch by Astraction”
Controller Controller
Users Model (Abstraction)
“users” (old) “user_prefs” “users”
old schema new schema
http://paulhammant.com/blog/branch_by_abstraction.html
http://continuousdelivery.com/2011/05/make-large-scale-changes-incrementally-with-branch-by-abstraction/
63. “The Migration 4-Step”
1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
65. We might remove config flags for the old version when...
It is no longer valid for the business.
It is no longer stable, maintained, or trusted.
It has poor performance characteristics.
The code is a mess, or difficult to read.
We can afford to spend time on it.
78. “Where a new system concept or new technology is used, one has to build a
system to throw away, for even the best planning is not so omniscient as to
get it right the first time. Hence plan to throw one away; you will, anyhow.”
~ Fred Brooks, The Mythical Man-Month
94. More database servers in prod.
Bigger database hardware in prod.
More web servers.
Various replication schemes.
Different versions of server and OS software.
Schema changes applied at different times.
Physical hardware in prod.
More data in prod.
Legacy data (7 years of odd user states).
More traffic in prod.
Wait, I mean MUCH more traffic in prod.
Fewer elves.
Faster disks (SSDs) in prod.
95. Using a MySQL database in dev for an application that will be running
on Oracle in production: Priceless
106. SERVER METRICS
Apache requests/sec, Busy processes,
CPU utilization, Script exec time (med. & 95th)
APPLICATION METRICS
Logins, Registrations, Checkouts,
Listings created, Forum posts
Time and event correlated.
113. Tighten your feedback cycles
Integrate with production and validate early in cycle.
Use tools that allow you to detect issues early.
Optimize for quick response times.
Applied to both feature development and operability.
114. Thank you
... and questions?
These slides will be available later today at http://mikebrittain.com/talks
Mike Brittain
ENGINEERING DIRECTOR @mikebrittain
mike@etsy.com