More and more startups/companies are deploying their infrastructure directly and exclusively in EC2 or similar cloud provider. With that comes a whole new set of challenges and paradigms around scalability, reliability and availability.
This talk will focus on how to leverage all the infrastructure parts of AWS, augment them with great (affordable) third party services and solid Open Source Software to create an operations environment that will scale with you, be as reliable as it can be, providing you and your peers with all the data you need to make good decisions to support (rapid) changes while letting you sleep through the night. And all that using a tiny operations team.
It may make you coffee in the morning too.
Chaos patterns - architecting for failure in distributed systemsJos Boumans
As we architect our systems for greater demands, scale, uptime, and performance, the hardest thing to control becomes the environment in which we deploy and the subtle but crucial interactions between complicated systems. Chaos Patterns help us establish and implement a virtuous cycle that let’s us both prove & improve our system along each of these dimensions before the inevitable happens.
While it may seem reckless or counter-intuitive, our experience has proven that it’s a matter of how and when (not if) we will learn about the limitations and failure modes of the system.
This is the story of the pitfalls we encountered, and how, through architecture, convention and common sense, we managed to build an infrastructure that is "Always Up" from the end user perspective and incredibly economical to build, scale & operate; using chaos testing, we learn more about how our system fails from a 10 second controlled failure than a multi-hour uncontrolled outage.
In this session we will cover various implementation techniques, available to any developer & operator, which will vastly increase the resilience of your systems and provide a superior end user experience; from optimizing your use of DNS for failure, to configuring your CDN to have your back, to synthetic responses and expected database outages.
But why stop there? Netflix has pioneered a culture and suite of tools that actively injects ‘once in a blue moon’ failures into its production systems, which lets you battle test your resilience design and let developers & operators sleep comfortably at night knowing their systems are able to handle even the worst of worst case scenarios.
One-Man Ops with Puppet & Friends.
If you're getting started in Amazon AWS here's 7 tools that will help you be successful, a few tips to make your life easier and some common pitfalls to avoid.
Which would you rather have: A rich design or a fast user experience? Users want both, but sometimes the interplay between design and performance feels like a fixed sum game: One side’s gain is the other side’s loss. Design and performance are indeed connected, but it’s more like the yin and yang. They aren’t opposing forces, but instead complement each other. Users want an experience that is rich and fast. The trick for us as designers and developers is figuring out how to do that.
The answer is to adopt an approach that considers both design and performance from the outset. With this approach, designs are conceived by teams of designers and developers working together. Developers benefit by participating in the product definition process. Designers benefit from understanding more about how designs are implemented. There’s an emphasis on early prototyping and tracking performance from the get-go.
With new metrics that focus on what a user actually sees as the page loads, we can now bridge the technical and language gaps that have hindered the seamless creation of great user experiences. In this presentation, Steve Souders, former Chief Performance Yahoo! and Google head performance engineer, explains how promoting a process that brings design and performance together at the beginning of a project helps deliver a web experience that is both fast and rich.
Slides from the Softwerkskammer Chemnitz meetup on Tuesday, 14th of September on
- chaos engineering
- software resilience
- resilience patterns
- execution of chaos experiments
- creation of chaos backlog
- finding weaknesses in your service landscape
- dark debt
- grey failure
Chaos patterns - architecting for failure in distributed systemsJos Boumans
As we architect our systems for greater demands, scale, uptime, and performance, the hardest thing to control becomes the environment in which we deploy and the subtle but crucial interactions between complicated systems. Chaos Patterns help us establish and implement a virtuous cycle that let’s us both prove & improve our system along each of these dimensions before the inevitable happens.
While it may seem reckless or counter-intuitive, our experience has proven that it’s a matter of how and when (not if) we will learn about the limitations and failure modes of the system.
This is the story of the pitfalls we encountered, and how, through architecture, convention and common sense, we managed to build an infrastructure that is "Always Up" from the end user perspective and incredibly economical to build, scale & operate; using chaos testing, we learn more about how our system fails from a 10 second controlled failure than a multi-hour uncontrolled outage.
In this session we will cover various implementation techniques, available to any developer & operator, which will vastly increase the resilience of your systems and provide a superior end user experience; from optimizing your use of DNS for failure, to configuring your CDN to have your back, to synthetic responses and expected database outages.
But why stop there? Netflix has pioneered a culture and suite of tools that actively injects ‘once in a blue moon’ failures into its production systems, which lets you battle test your resilience design and let developers & operators sleep comfortably at night knowing their systems are able to handle even the worst of worst case scenarios.
One-Man Ops with Puppet & Friends.
If you're getting started in Amazon AWS here's 7 tools that will help you be successful, a few tips to make your life easier and some common pitfalls to avoid.
Which would you rather have: A rich design or a fast user experience? Users want both, but sometimes the interplay between design and performance feels like a fixed sum game: One side’s gain is the other side’s loss. Design and performance are indeed connected, but it’s more like the yin and yang. They aren’t opposing forces, but instead complement each other. Users want an experience that is rich and fast. The trick for us as designers and developers is figuring out how to do that.
The answer is to adopt an approach that considers both design and performance from the outset. With this approach, designs are conceived by teams of designers and developers working together. Developers benefit by participating in the product definition process. Designers benefit from understanding more about how designs are implemented. There’s an emphasis on early prototyping and tracking performance from the get-go.
With new metrics that focus on what a user actually sees as the page loads, we can now bridge the technical and language gaps that have hindered the seamless creation of great user experiences. In this presentation, Steve Souders, former Chief Performance Yahoo! and Google head performance engineer, explains how promoting a process that brings design and performance together at the beginning of a project helps deliver a web experience that is both fast and rich.
Slides from the Softwerkskammer Chemnitz meetup on Tuesday, 14th of September on
- chaos engineering
- software resilience
- resilience patterns
- execution of chaos experiments
- creation of chaos backlog
- finding weaknesses in your service landscape
- dark debt
- grey failure
How many photo carousels have you built? Date pickers? Dynamic tables and charts? Wouldn't it be great if there was a way to make these custom elements encapsulated and reusable? Welcome to Web Components! The building blocks are well known: HTML templates, custom elements, HTML imports, and shadow DOM. It's fairly easy to build simple examples. But what happens when performance degrades? Join this discussion of the synchronous and asynchronous nature of web components, and how they can impact the rendering of the entire page.
Don't Just do Agile - AgileDC ConferenceSimon Storm
Are you struggling to implement Agile at your company? What could be better than to learn from someone who has done it wrong over and over! We want to share our experiences pioneering Agile at a FinTech company. After multiple attempts and through sheer stubbornness, we were we able to get it right and improve our release pace by 650% annually. We will walk through where we went wrong, what we did right, and why we now understand that Agile cannot be successful without profound collaboration, Continuous Delivery, a DevOps culture and a desire to continuously improve.
Presentation at WebPerfDays Amsterdam, May 18 2013.
This newish browser API can be used to gain insight in the load time of individual page resources. Does the API behave consistently and as expected? Short answer: no, not really. Long answer: view the presentation ;-)
Every URL visited from the Facebook iPhone app is done through a webview. Same with Twitter. Even if you don't have a mobile app, your website gets a lot of traffic from webviews. And yet, testing on webviews is challenging. There are significant performances differences between UIWebView vs WkWebView, and similarly for Android webview vs the new Chromium webview. And what about home screen apps?! In this talk, Steve Souders discusses the differences across webviews and how that affects performance of mobile web apps.
Your visitors interact with content, not with your website. Content consistency is crucial to a successful user experience. Re-publishing is one option but it’s an inside-out action that relies on the authority controlling where the information goes. An API frees your data and the responsibility to where it is published and accessed. Mobile is a major consumer for your API but not every API is setup to handle the mass of requests coming from those devices. Learn how to mobile devices consume API’s with limited or low bandwidth and how to to tailor your API to be as efficient and effective as possible.
http://environmentsforhumans.com/2012/doteduguru-summit/
How to make successful use of the cloud for your software startup. Based on 4 years of using various cloud services. Includes advice, war stories, and best practices.
Presented at CoderFaire Atlanta 2013.
How many photo carousels have you built? Date pickers? Dynamic tables and charts? Wouldn't it be great if there was a way to make these custom elements encapsulated and reusable? Welcome to Web Components! The building blocks are well known: HTML templates, custom elements, HTML imports, and shadow DOM. It's fairly easy to build simple examples. But what happens when performance degrades? Join this discussion of the synchronous and asynchronous nature of web components, and how they can impact the rendering of the entire page.
Don't Just do Agile - AgileDC ConferenceSimon Storm
Are you struggling to implement Agile at your company? What could be better than to learn from someone who has done it wrong over and over! We want to share our experiences pioneering Agile at a FinTech company. After multiple attempts and through sheer stubbornness, we were we able to get it right and improve our release pace by 650% annually. We will walk through where we went wrong, what we did right, and why we now understand that Agile cannot be successful without profound collaboration, Continuous Delivery, a DevOps culture and a desire to continuously improve.
Presentation at WebPerfDays Amsterdam, May 18 2013.
This newish browser API can be used to gain insight in the load time of individual page resources. Does the API behave consistently and as expected? Short answer: no, not really. Long answer: view the presentation ;-)
Every URL visited from the Facebook iPhone app is done through a webview. Same with Twitter. Even if you don't have a mobile app, your website gets a lot of traffic from webviews. And yet, testing on webviews is challenging. There are significant performances differences between UIWebView vs WkWebView, and similarly for Android webview vs the new Chromium webview. And what about home screen apps?! In this talk, Steve Souders discusses the differences across webviews and how that affects performance of mobile web apps.
Your visitors interact with content, not with your website. Content consistency is crucial to a successful user experience. Re-publishing is one option but it’s an inside-out action that relies on the authority controlling where the information goes. An API frees your data and the responsibility to where it is published and accessed. Mobile is a major consumer for your API but not every API is setup to handle the mass of requests coming from those devices. Learn how to mobile devices consume API’s with limited or low bandwidth and how to to tailor your API to be as efficient and effective as possible.
http://environmentsforhumans.com/2012/doteduguru-summit/
How to make successful use of the cloud for your software startup. Based on 4 years of using various cloud services. Includes advice, war stories, and best practices.
Presented at CoderFaire Atlanta 2013.
How to measure everything - a million metrics per second with minimal develop...Jos Boumans
Krux is an infrastructure provider for many of the websites you
use online today, like NYTimes.com, WSJ.com, Wikia and NBCU. For
every request on those properties, Krux will get one or more as
well. We grew from zero traffic to several billion requests per
day in the span of 2 years, and we did so exclusively in AWS.
To make the right decisions in such a volatile environment, we
knew that data is everything; without it, you can't possibly make
informed decisions. However, collecting it efficiently, at scale,
at minimal cost and without burdening developers is a tremendous
challenge.
Join me in this session to learn how we overcame this challenge
at Krux; I will share with you the details of how we set up our
global infrastructure, entirely managed by Puppet, to capture over
a million data points every second on virtually every part of the
system, including inside the web server, user apps and Puppet itself,
for under $2000/month using off the shelf Open Source software and
some code we've released as Open Source ourselves. In addition, I’ll
show you how you can take (a subset of) these metrics and send them
to advanced analytics and alerting tools like Circonus or Zabbix.
This content will be applicable for anyone collecting or desiring to
collect vast amounts of metrics in a cloud or datacenter setting and
making sense of them.
Fast Slim Correct: The History and Evolution of JavaScript.John Dalziel
A look back at how JavaScript has evolved over the past 18 years - how it broke out of the browser and can now be found in the most unexpected places. Presented at Worthing Digital, 7th Nov 2013.
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014Amazon Web Services
Log data contains some of the most valuable raw information you can gather and analyze about your infrastructure and applications. Amid the mess of confusing lines of seemingly random text can be hints about performance, security, flaws in code, user access patterns, and other operational data. Without the proper tools, finding insights in these logs can be like searching for a hay-colored needle in a haystack. In this session you learn what practices and patterns you can easily implement that can help you better understand your log files. You see how you can customize web logs to add more information to them, how to digest logs from around your infrastructure, and how to analyze your log files in near real time.
Quick overview on the journey of storage w.r.t containers and emerging trends in container storage. A look at the requirements and architectural patterns of the new storage startups. Presented at the Docker meetup#24 at #GoJekIndia
As browsers explode with new capabilities and migrate onto devices users can be left wondering, “what’s taking so long?” Learn how HTML, CSS, JavaScript, and the web itself conspire against a fast-running application and simple tips to create a snappy interface that delight users instead of frustrating them.
The Server Side of Responsive Web DesignDave Olsen
Responsive web design has become an important tool for front-end developers as they develop mobile-optimized solutions for clients. Browser-detection has been an important tool for server-side developers for the same task for much longer. Unfortunately, both techniques have certain limitations. Depending on project requirements, team make-up and deployment environment combining these two techniques might lead to intriguing solutions for your organization. We'll discuss when it makes sense to take this extra step and we'll explore techniques for combining server-side technology, like server-side feature-detection, with your responsive web designs to deliver the most flexible solutions possible.
Slides for an introductory workshop on cloud computing for a web app developer audience at FOWA Miami 09 (http://events.carsonified.com/fowa/2009/miami/workshops#workshop_36)
RESS: An Evolution of Responsive Web DesignDave Olsen
Responsive web design has become an important tool for front-end developers as they develop mobile-optimized solutions for clients. Browser-detection has been an important tool for server-side developers for the same task for much longer. Unfortunately, both techniques have certain limitations. I’ll show how both front-end and server-side developers can take advantage of the new technique called RESS (Responsive Web Design with Server Side Components) that aims to be combine the best of both worlds for delivering mobile-optimized content.
Automating Oracle Database deployment with Amazon Web Services, fabric, and botomjbommar
Have credit card, need database? In this talk, I'll show you how to deploy your own Oracle 11gR2 sandbox with a single keystroke (and I don't mean RDS). Along the way, we'll learn about Infrastructure-as-a-Service with boto, provisioning tools like fabric, and Oracle response files. When we're done, we'll have a repeatable, ten-minute process that can deliver a server as cheap as $5/day or as powerful as 40k IOPS and 2.6GB/s throughput. More importantly, we'll understand what the big deal about IaaS and automated provisioning really is, and how enterprise products like Oracle can still fit comfortably in the space.
Similar to Reliability & Scale in AWS while letting you sleep through the night (20)
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Reliability & Scale in AWS while letting you sleep through the night
1. ONE MAN OPS
Reliability & Scale in AWS while letting you sleep through the night
Jos Boumans - @jiboumans
http://www.fwallpaper.net/picture_pics-Sleepy-cat.html
17. AWS OUTAGE = YOUR OUTAGE
http://it.mario.wikia.com/wiki/Lakitu
18. THE RULES HAVE CHANGED
You're not in Kansas anymore
http://entreatmenot.blogspot.com/2011/04/shattered-dreams.html
19. NETWORK WILL PARTITION
And it will happen often
http://thevinylvillain.blogspot.com/2010_04_01_archive.html
20. DISK IO WILL FLUCTUATE
On a good day, it's mediocre
http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm
21. IP ADDRESSES WILL CHANGE
IP lease is 8 hours
DNS TTL is 60 seconds
www.fantom-xp.com
22. INSTANCES WILL DIE
And it will always be your Database Master
http://room57.deviantart.com/art/Hangman-188353196
24. EMBRACE FAILURE
Hardware will fail. Humans will make errors.
Nature will produce thunderstorms.
http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm
25. ADJUST YOUR STRATEGY
Don't bring a knife to a gun fight
http://www.flickr.com/photos/statlerhotel/6628770499/sizes/l/in/photostream/
26. DATA STORES
Some work better than others
http://gustavhoiland.com/2010/03/10/stacked-boxes/
27. RDBMS
CouchDB
BigTable Based
Dynamo Based
Master / Slave based
CAP THEOREM
Your choice: sacrifice availability or consistency.
Orange is a lie.
28. MYSQL / ORACLE VS RDS
See: Network partitioning & instances dying
29. BIGTABLE BASED STORES
HBase, Accumulo, Hypertable
Still suffer when network partitioning happens
http://www.cloudera.com/cdh4/
30. DYNAMO BASED STORES
Cassandra, Riak, DynamoDB
http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html http://aws.amazon.com/dynamodb/faqs/
31. GO HOSTED?
CouchDB, MongoDB, Riak, Cassandra, HBase
Your Latency May Vary
http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html
32. CLIENT SIDE STORAGE
Keep a copy of your users data locally
http://www.wired.com/gadgetlab/2012/03/badass-gadget-ammo-lunch-box/ http://www.w3.org/2001/tag/2010/09/ClientSideStorage.html
33. FILE STORES
EBS vs Instance Store
http://homedezine.blogspot.com/2011/04/day-my-cat-removed-carpet-photo-studio.html
34. SIMPLE STORAGE SERVICE
S3: Arguably AWS' best feature
http://www.iwallpaper.us/gold-star-fo-christmas-wallpaper-140/
35. TRAFFIC SHAPING
Control every part of the request
http://www.visualphotos.com/image/2x4154765/man_standing_with_traffic_cones_in_shape_of_u-turn
36. STAY LOCAL IF YOU CAN
Going off box exposes you to risks you need to mitigate
http://southshorewoman.com/issue/june-2010/article/local-character
37. CACHE WHAT YOU CAN
HTTP Responses, DB Queries, User content
Browsers have caches too!
http://theoatmeal.com/blog/charity_money
38. USE ELASTIC LOAD BALANCERS
They will save you more than once
http://wallpapers5.com/wallpaper/Balance-Green-Tree-Frog/
39. USE GLOBAL LOAD BALANCING
Fail over to the closest data center on region failure
41. USE A CDN
Critical items should always be available
http://kadanthuponanimidangal.blogspot.com/2010/12/blog-post_6992.html
42. MEASURE EVERYTHING
Find outliers, deviants & trends before they cause trouble
http://www.themoviedb.org/movie/629-the-usual-suspects
43. GRAPHITE, STATSD & COLLECTD
Use Statsd & Collectd for application/system metrics
Use graphite to store, aggregate & visualize
http://hostedgraphite.com/
http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html http://jiboumans.wordpress.com/2012/07/02/measure-all-the-things/
44. GRAPH EVENTS
Deployments, outages, CDN reconfigurations, failed builds, etc
Anything that's important to the health of your eco system
http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
45. COMPARE WEEK TO WEEK
Overlay week to week graphs using timeShift()
Quickly identifies trends and deviations from trends
http://obfuscurity.com/2012/04/Unhelpful-Graphite-Tip-10
46. FORECASTING
Use Holt-Winters confidence bands
Verify that your metrics are within normal tolerance
https://github.com/ripienaar/graphite-graph-dsl/wiki/Creating-Holt-Winters-Forecasts
47. FIND INDIVIDUAL OUTLIERS
Absolute numbers mean very little
Use mean & standard deviation
http://en.wikipedia.org/wiki/File:Black_sheep-1.jpg
48. ALERT ON TRENDS
Once you go over a threshold, it's too late
Alert on unwanted trends and preemptively fix
http://sub-second.blogspot.com/2012/06/reporting-response-times-percentile.html http://aphyr.github.com/riemann/
50. SHOUT OUT: NEW RELIC
Python, Ruby, .NET, Java, PHP support
In depth profiling of your app for performance & errors.
51. CONFIGURATION MANAGEMENT
Unique snowflakes are bad
http://www.torange.us/Plants/Conifers/spruce-needles-in-hoarfrost-424.html
52. PUPPET VS CHEF
Yes.
http://puppetlabs.com/
http://www.opscode.com/chef
53. INFRASTRUCTURE AS CODE
Use different environments
Measure and report on it
http://americansingercanary.com/green.htm
54. SHOUT OUT: UBUNTU
Ubuntu + cloud-init + boto = awesome*
*I am biased
http://www.123rf.com/photo_4871141_food-pyramid-isolated-on-white.html https://github.com/krux/ops-tools
55. DEV = PRODUCTION
"I dunno, it worked on my laptop"
Instead, use vagrant
http://vagrantup.com/ http://vagrantup.com/
56. ROLL YOUR OWN AMIS
Instantly boot up new deployments
Reduce Time to Respond
http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html http://puppetlabs.com/blog/rapid-scaling-with-auto-generated-amis-using-puppet/
57. CONFIDENT DEPLOYS
That human error could be yours
http://www.etsy.com/listing/37178125/stormtrooper-regrets-those-were-the
58. CONTINUOUS INTEGRATION
Ours: Github + Jenkins + FPM + apt::s3
From commit to deployable in one command http://github.com/
http://jenkins-ci.org/
https://github.com/thekad/apt-s3
https://github.com/jordansissel/fpm/wiki/
59. ONE CLICK DEPLOYMENTS
Deployments should not be exciting.
Don't create a checklist; automate & track
http://www.thegreenhead.com/2012/07/one-click-butter-cutter.php https://checkmarkable.com/
60. DARK LAUNCHES
Exercise the code without impacting the user experience
http://www.kissmetrics.com/
http://www.layoutsparks.com/pictures/moon-23 https://github.com/yahoo/boomerang/
61. SHADOW TRAFFIC
Test new code against live traffic
http://doppelthingers.tumblr.com/post/12839979386/traffic-light-shadow-hangman-and-possibly-his https://gist.github.com/3125323