SlideShare a Scribd company logo
1 of 71
Building Self-Healing
Systems
Todd Minnella and Matt Solnit, SOASTA
Speaker Intro - Todd
● Director of Ops for
● Over 25 years in IT
● Experience with both
academic and
enterprise computing
● Favorite operating system is Tru64
● Enjoys solving problems...but loves sleep more!
@toddminnella
tminnella@soasta.com
Speaker Intro - Matt
● VP of Engineering for
● Started programming with Atari BASIC in
elementary school
● Ops on the side :-)
● First Velocity presentation!
@msolnit
msolnit@soasta.com
Who are you? :-)
http://www.cliarthut.com/clip-
arts/751/who-are-you-clip-art-
751173.jpg
Agenda (1 of 2)
Part One - Theory
● Distributed Systems Challenges
● Mitigating Failure Impact
● Benefits and Risks
● Testing Requirements
● Methodology
Agenda (2 of 2)
Part Two - Practice
● Description of Demo System
● Example #1 - Externally Triggered Full GC
● Example #2 - External System Restart
● Example #3 - System-initiated Support Case
● Tools Demonstrated
● Other Ideas for Automation
Part One
Theory
What makes a distributed system?
● Multiple components
● Different servers
● Different regions (data center or geo)
● A component failure != service or app failure
● Requires systems thinking
Challenges faced by dist. systems
● Complexity
● Uncontrollable elements
● Hard to see the whole picture
● Impossible for a single person to manage
What can we do about it?
Easy answer:
Add people!
But… easy != correct
Better coping strategy
Enable your systems to
heal themselves
...which is why we are here!
Benefits of Self-Healing
● Better uptime (at the component level)
● Higher service quality
● Rapid identification of repeating issues
● Improved Ops team morale and productivity
Risk of Self-Healing Systems
● Worse uptime (at the component level)
● Lower service quality
● Maintenance complexities
● Degraded Ops team morale and productivity
Risks
So why take the risks?
Implemented well, self-
healing systems can
make for happier
customers!
Failsafe Design
Bibel, G. D. Train Wreck: The
Forensics of Rail Disasters.
Baltimore: Johns Hopkins
UP, 2012. 69-70. Print.
Methodology
Identify the Problem
Design the Solution
Execute by Hand
Automate the solution
Watch and adjust
PSHAW!
Part Two
Practice
Demo Application
Java App Server Farm (n = 2)
Amazon Linux EC2 Instance
EC2 Elastic IP address
Load Balanced via DNS (Dyn Traffic Director)
Simple Web Application (HTTP/HTTPS)
Example #1
Externally Triggered Full GC
Real-life mPulse example
Started reporting Java statistics to monitoring tool in
2013.
When investigating outages, often found an exact
correlation with large garbage collections (sound
familiar?).
Set up an alert to fire when heap usage went above
70%
Everybody into the war room!
Real-life mPulse example, cont’d
Real-life mPulse example, cont’d
Engineering looks for a possible memory leak.
Eventually someone says, “Just force a GC!”
Most of the time, this would fix it. JVM isn’t perfect, if
we help it then the system remains stable.
Occasionally this didn’t fix it, which would indicate an
actual bug.
Engineering fixes, deploy, repeat!
“Intermittent gratification”
90% of the time, there was no need to gather
everyone together.
Real-life mPulse example, cont’d
Engineering says…
Ops, can you fix it?
Identify the Problem
1. Java isn’t garbage-collecting efficiently.
2. Tuning the JVM is time-consuming and
dangerous.
3. Forcing a collection works, but it requires
waking someone up.
Describe a Solution (1 of 2)
Identify a metric for JVM Heap Use that is
indicative of the problem:
Java VM Old % Used
Start monitoring/reporting this metric.
Specify a threshold for action:
Old % Used > 65%
Describe a Solution (2 of 2)
When the threshold is reached, take an action:
Trigger a full garbage collection
After the action, monitor for success:
Old % Used < 65%
Execute by Hand
Trigger the condition that causes the problem
(or be patient and let it happen).
Once monitoring indicates high old % used,
manually execute the full GC.
Automate the Solution, Manually Trigger
Write a script to check for Java old % used.
Run the script via cron or similar mechanism.
Report when old % used exceeds threshold.
A DevOps human will trigger the full GC.
Script Snippet
JAVA_PID=`pgrep -f -u tomcat /usr/lib/jvm/jre/bin/java`
RAW_JSTATS=`jstat -gcutil $JAVA_PID | grep -v "S0"`
old_pcnt_used=`echo $RAW_JSTATS | cut -f4 -d" "`
integer_old_pcnt_used=`echo $old_pcnt_used | 
awk '{ printf ("%1.0f", $1) }'`
if [ $integer_old_pcnt_used -gt $oldpcnttrigger ]; then
echo "Would trigger full GC here"
fi
https://github.com/SOASTA/velocity-2015-self-healing-systems
DEMO (part 1)
Automate the Solution, Automate the Trigger
Taking the script shown previously, combine
the step that:
Reports that old % used > 65%
with the step that:
Triggers the full GC
DEMO (part 2)
Watch and adjust
Set up the automated script to run in as many
test environments as are available/applicable.
Review the results (script log, metrics graphs).
Does it work?
Investigate any issues thoroughly.
Potentially, install the script in a dry-run mode
in production.
Go Live!
We recommend a gradual deployment.
Deploy to a subset of production, then assess.
Expand the subset, assess again.
When all of production is live, enjoy more
sleep!
Example #2
Externally Triggered Restart
Real-life mPulse example
Real-life mPulse example
What is a beacon?
{"timestamp":1392256183739,"drop_code":"crumb:missing","http_method":"GET","http_version":"HTTP
/1.1","http_referrer":"","headers":{"host":"localhost:8080","accept":"*/*"},"params":{"nt_dns_e
nd":"1392147897985","nt_load_end":"1392147912182","nt_first_paint":"1392147900.964995","mem.use
d":"131000000","nt_spdy":"0","nt_unload_end":"1392147898577","nt_dns_st":"1392147897985","nt_co
n_st":"1392147897985","rt.bmr.conEn":"834.00000000006","rt.bmr.resEn":"2320.0000000001637","mem
.total":"199000000","nt_nav_st":"1392147897985","nt_domcontloaded_end":"1392147901891","dom.sz"
:"58549","rt.tstart":"1392147897985","rt.bmr.domSt":"419.0000000000964","nt_con_end":"139214789
7985","nt_domint":"1392147901585","nt_red_end":"0","dom.ln":"939","nt_unload_st":"1392147898574
","t_done":"14201","nt_load_st":"1392147912129","t_page":"13638","rt.end":"1392147912186","nt_d
omloading":"1392147898927","nt_res_end":"1392147898571","t_resp":"563","rt.bmr.domEn":"813.0000
000001019","rt.tt":"14201","nt_red_cnt":"0","if":"","nt_fet_st":"1392147897985","nt_res_st":"13
92147898548","nt_req_st":"1392147897995","nt_nav_type":"0","mob.ct":"0","dom.img":"16","nt_red_
st":"0","rt.ss":"1392147897985","config.timedout":"true","rt.bmr.resSt":"2312.0000000001255","r
t.si":"3el0j57fms0885mi-
n0uk6y","rt.sl":"1","rt.bmr.fetSt":"16.000000000076398","rt.bmr.conSt":"813.0000000001019","nt_
domcomp":"1392147912129","dom.script":"27","v":"0.9.1389663787","rt.bmr.reqSt":"834.00000000006
","r":"","rt.bstart":"1392147906107","rt.obo":"0","rt.start":"navigation","nt_domcontloaded_st"
:"1392147901585"}}
Real-life mPulse example, cont’d
Each server processes millions of these per day.
Beacons are logged to disk, eventually
compressed and uploaded to S3.
Real-life mPulse example, cont’d
Every so often, the background uploader thread
stops working.
(we don’t know why yet)
When this happens, we get 10-12 hours before
the disk fills up and the server dies.
Real-life mPulse example, cont’d
A simple re-start fixes it.
SO...
While developers are investigating, Ops is
getting paged (and woken up) to re-start boxes.
Ops says…
We can do better!
Identify the Problem (Demo App)
● Lack of activity indicates a failed thread
● While the issue goes unresolved, data is
delayed (and the disk may fill)
Describe a Solution
● A restart of the application solves the
problem
● The application server needs to be removed
from service prior to the restart
● The server hosting the application is an
AWS instance, and a reboot is fast and
effective
Execute by Hand
1. Take the application out-of-service
2. Restart the application
3. Watch for Self-Check OK
4. Put the application back in-service
Automate the Solution, Manually Trigger
● Log metrics go to AWS CloudWatch
● Lack of activity triggers an Alarm
● Alarm triggers a SNS notification
● Human being makes the DNS changes and
restart the server.
DEMO
Developers say…
We can do better!
Automate the Solution, Automate the Trigger
● EC2 and DynECT both have APIs
● DNS changes and reboot can all be
automated
● Todd can sleep!
Automate the Solution, Automate the
Trigger
AWS Lambda
Upload code to Amazon (Node.js)
Attach it to a listener (SNS)
No instance required!
Automate the Solution, Automate the
Trigger
Lambda function listens on “logs are not being
uploaded” notification.
Uses Dyn REST API to disable the DNS
record.
Uses EC2 API to re-boot the instance.
Automate the Solution, Automate the
Trigger
Lambda function listens on “all OK” notification.
Uses Dyn REST API to re-enable the DNS
record.
var dynect = require('./dynect_api.js');
var AWS = require('aws-sdk');
exports.cloudwatch_alarm_sns_handler = function(event, context) {
event.Records.forEach(function(record) {
var alarm = JSON.parse(record.Sns.Message);
// Extract the instance status. ALARM means it's down, OK means it's up.
var instance_up = alarm.NewStateValue !== "ALARM";
// ...
https://github.com/SOASTA/velocity-2015-self-healing-systems
Node.js code snippet
New workflow
Look, no Todd!
DEMO
Watch and adjust
● Include Ops team on ALARM and
SELFCHECKOK notifications
● Observe effects - use monitoring tools to
assess availability
Example #3
Application files support ticket
Real-life mPulse example
● Customers configure raw beacon uploads to
their own S3 buckets.
● Sometimes they break
things (or AWS access
key is changed, etc.)
● We log the error, but we don’t monitor it and
don’t notify customers.
Identify the Problem
● Another example: yser connecting to a site
can’t authenticate successfully
● Assumption is that this is a limited access
site
DevOps says…
Now, let’s help our customers
succeed!
Describe a Solution
● Notify the Customer Support team
● Provide Support with details so that they can
proactively reach out
Execute by Hand
● Examine the logs for the error
● Review the situation with Support
● Work with Support to handle a case end-to-
end
Automate the Solution, Manually Trigger
● Log metrics go to AWS CloudWatch
● Presence of error triggers an Alarm
● Alarm triggers a SNS notification
● Human being can then create a Zendesk
case
Automate the Solution, Automate the Trigger
● AWS Lambda listens on SNS notification
● Collects information from the notification
● Files a Zendesk case categorized to go to
the correct team
AWS Lambda Actions
On Failed Login notification
● Create a Zendesk case with user details
Watch and adjust
● Ops reviews logs
● Ops meets with Support to review case
frequency and outcomes
Testing Requirements
● Start small
● Develop (and verify) in stages
● Let run in production-like environment
● Verify behavior in “dry-run” mode
Tools Demonstrated - AWS
CloudWatch http://aws.amazon.com/cloudwatch/
EC2 http://aws.amazon.com/ec2/
Lambda http://aws.amazon.com/lambda/
Linux http://aws.amazon.com/amazon-linux-ami/
Tools Demonstrated - Other
Datadog https://www.datadoghq.com/product/
Dyn Traffic Director http://dyn.com/traffic-director/
Monitis http://www.monitis.com/
PagerDuty http://www.pagerduty.com
ZenDesk https://www.zendesk.com
See SOASTA at booth #801

More Related Content

What's hot

Building resilient applications
Building resilient applicationsBuilding resilient applications
Building resilient applicationsNuno Caneco
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUGslandelle
 
Do you know all of Puppet?
Do you know all of Puppet?Do you know all of Puppet?
Do you know all of Puppet?Julien Pivotto
 
Profiler Guided Java Performance Tuning
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuningosa_ora
 
Multithreading and concurrency in android
Multithreading and concurrency in androidMultithreading and concurrency in android
Multithreading and concurrency in androidRakesh Jha
 
Testing in android
Testing in androidTesting in android
Testing in androidjtrindade
 
Building Hermetic Systems (without Docker)
Building Hermetic Systems (without Docker)Building Hermetic Systems (without Docker)
Building Hermetic Systems (without Docker)William Farrell
 
Java concurrency in practice
Java concurrency in practiceJava concurrency in practice
Java concurrency in practiceMikalai Alimenkou
 
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Brian Brazil
 
Synchronization problem with threads
Synchronization problem with threadsSynchronization problem with threads
Synchronization problem with threadsSyed Zaid Irshad
 
Nullcon Hack IM 2011 walk through
Nullcon Hack IM 2011 walk throughNullcon Hack IM 2011 walk through
Nullcon Hack IM 2011 walk throughAnant Shrivastava
 
The End of the world as we know it - AKA your last NullPointerException $1B b...
The End of the world as we know it - AKA your last NullPointerException $1B b...The End of the world as we know it - AKA your last NullPointerException $1B b...
The End of the world as we know it - AKA your last NullPointerException $1B b...Michael Vorburger
 
Understanding Scratch Extensions with JavaScript (Part 2 of 2)
Understanding Scratch Extensions with JavaScript (Part 2 of 2)Understanding Scratch Extensions with JavaScript (Part 2 of 2)
Understanding Scratch Extensions with JavaScript (Part 2 of 2)Darren Adkinson
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & ProfilingIsuru Perera
 

What's hot (19)

Efficient Android Threading
Efficient Android ThreadingEfficient Android Threading
Efficient Android Threading
 
Building resilient applications
Building resilient applicationsBuilding resilient applications
Building resilient applications
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 
Do you know all of Puppet?
Do you know all of Puppet?Do you know all of Puppet?
Do you know all of Puppet?
 
Profiler Guided Java Performance Tuning
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuning
 
Performance tests with Gatling
Performance tests with GatlingPerformance tests with Gatling
Performance tests with Gatling
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Multithreading and concurrency in android
Multithreading and concurrency in androidMultithreading and concurrency in android
Multithreading and concurrency in android
 
Testing in android
Testing in androidTesting in android
Testing in android
 
Beyond Unit Testing
Beyond Unit TestingBeyond Unit Testing
Beyond Unit Testing
 
Android concurrency
Android concurrencyAndroid concurrency
Android concurrency
 
Building Hermetic Systems (without Docker)
Building Hermetic Systems (without Docker)Building Hermetic Systems (without Docker)
Building Hermetic Systems (without Docker)
 
Java concurrency in practice
Java concurrency in practiceJava concurrency in practice
Java concurrency in practice
 
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
 
Synchronization problem with threads
Synchronization problem with threadsSynchronization problem with threads
Synchronization problem with threads
 
Nullcon Hack IM 2011 walk through
Nullcon Hack IM 2011 walk throughNullcon Hack IM 2011 walk through
Nullcon Hack IM 2011 walk through
 
The End of the world as we know it - AKA your last NullPointerException $1B b...
The End of the world as we know it - AKA your last NullPointerException $1B b...The End of the world as we know it - AKA your last NullPointerException $1B b...
The End of the world as we know it - AKA your last NullPointerException $1B b...
 
Understanding Scratch Extensions with JavaScript (Part 2 of 2)
Understanding Scratch Extensions with JavaScript (Part 2 of 2)Understanding Scratch Extensions with JavaScript (Part 2 of 2)
Understanding Scratch Extensions with JavaScript (Part 2 of 2)
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & Profiling
 

Viewers also liked

Keys To World-Class Retail Web Performance - Expert tips for holiday web read...
Keys To World-Class Retail Web Performance - Expert tips for holiday web read...Keys To World-Class Retail Web Performance - Expert tips for holiday web read...
Keys To World-Class Retail Web Performance - Expert tips for holiday web read...SOASTA
 
Metrics, metrics everywhere (but where the heck do you start?)
Metrics, metrics everywhere (but where the heck do you start?) Metrics, metrics everywhere (but where the heck do you start?)
Metrics, metrics everywhere (but where the heck do you start?) SOASTA
 
Closing the Mobile App Quality Gap webinar
Closing the Mobile App Quality Gap webinarClosing the Mobile App Quality Gap webinar
Closing the Mobile App Quality Gap webinarSOASTA
 
2015 05-29 velocity sc keynote
2015 05-29 velocity sc keynote2015 05-29 velocity sc keynote
2015 05-29 velocity sc keynoteSOASTA
 
Performance Warrior Tales: Cloud Load Testing the Retail Giants
Performance Warrior Tales: Cloud Load Testing the Retail Giants Performance Warrior Tales: Cloud Load Testing the Retail Giants
Performance Warrior Tales: Cloud Load Testing the Retail Giants SOASTA
 
EMEA Webinar - An Introduction to Real User Measurement
EMEA Webinar - An Introduction to Real User Measurement EMEA Webinar - An Introduction to Real User Measurement
EMEA Webinar - An Introduction to Real User Measurement SOASTA
 
Agile Load Testing In The Real World
Agile Load Testing In The Real WorldAgile Load Testing In The Real World
Agile Load Testing In The Real WorldSOASTA
 
Adopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous DeliveryAdopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous DeliverySOASTA
 
Testing In Production (TiP) Advances with Big Data and the Cloud
Testing In Production (TiP) Advances with Big Data and the CloudTesting In Production (TiP) Advances with Big Data and the Cloud
Testing In Production (TiP) Advances with Big Data and the CloudSOASTA
 
How to the Measure Business impact of Web Performance
How to the Measure Business impact of Web PerformanceHow to the Measure Business impact of Web Performance
How to the Measure Business impact of Web PerformanceSOASTA
 
The Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest MentoraThe Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest MentoraSOASTA
 
O'Reilly Webcast: How Nordstrom Prepares Its Site for Holidays and Major Events
O'Reilly Webcast: How Nordstrom Prepares Its Site for Holidays and Major EventsO'Reilly Webcast: How Nordstrom Prepares Its Site for Holidays and Major Events
O'Reilly Webcast: How Nordstrom Prepares Its Site for Holidays and Major EventsSOASTA
 
Secrets to Realistic Load Testing
Secrets to Realistic Load TestingSecrets to Realistic Load Testing
Secrets to Realistic Load TestingSOASTA
 
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...SOASTA
 

Viewers also liked (14)

Keys To World-Class Retail Web Performance - Expert tips for holiday web read...
Keys To World-Class Retail Web Performance - Expert tips for holiday web read...Keys To World-Class Retail Web Performance - Expert tips for holiday web read...
Keys To World-Class Retail Web Performance - Expert tips for holiday web read...
 
Metrics, metrics everywhere (but where the heck do you start?)
Metrics, metrics everywhere (but where the heck do you start?) Metrics, metrics everywhere (but where the heck do you start?)
Metrics, metrics everywhere (but where the heck do you start?)
 
Closing the Mobile App Quality Gap webinar
Closing the Mobile App Quality Gap webinarClosing the Mobile App Quality Gap webinar
Closing the Mobile App Quality Gap webinar
 
2015 05-29 velocity sc keynote
2015 05-29 velocity sc keynote2015 05-29 velocity sc keynote
2015 05-29 velocity sc keynote
 
Performance Warrior Tales: Cloud Load Testing the Retail Giants
Performance Warrior Tales: Cloud Load Testing the Retail Giants Performance Warrior Tales: Cloud Load Testing the Retail Giants
Performance Warrior Tales: Cloud Load Testing the Retail Giants
 
EMEA Webinar - An Introduction to Real User Measurement
EMEA Webinar - An Introduction to Real User Measurement EMEA Webinar - An Introduction to Real User Measurement
EMEA Webinar - An Introduction to Real User Measurement
 
Agile Load Testing In The Real World
Agile Load Testing In The Real WorldAgile Load Testing In The Real World
Agile Load Testing In The Real World
 
Adopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous DeliveryAdopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous Delivery
 
Testing In Production (TiP) Advances with Big Data and the Cloud
Testing In Production (TiP) Advances with Big Data and the CloudTesting In Production (TiP) Advances with Big Data and the Cloud
Testing In Production (TiP) Advances with Big Data and the Cloud
 
How to the Measure Business impact of Web Performance
How to the Measure Business impact of Web PerformanceHow to the Measure Business impact of Web Performance
How to the Measure Business impact of Web Performance
 
The Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest MentoraThe Four Hats of Load and Performance Testing with special guest Mentora
The Four Hats of Load and Performance Testing with special guest Mentora
 
O'Reilly Webcast: How Nordstrom Prepares Its Site for Holidays and Major Events
O'Reilly Webcast: How Nordstrom Prepares Its Site for Holidays and Major EventsO'Reilly Webcast: How Nordstrom Prepares Its Site for Holidays and Major Events
O'Reilly Webcast: How Nordstrom Prepares Its Site for Holidays and Major Events
 
Secrets to Realistic Load Testing
Secrets to Realistic Load TestingSecrets to Realistic Load Testing
Secrets to Realistic Load Testing
 
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
 

Similar to Velocity 2015: Building Self-Healing Systems

PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...Puppet
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the TrenchesYan Cui
 
Infrastructure as Code, Theory Crash Course
Infrastructure as Code, Theory Crash CourseInfrastructure as Code, Theory Crash Course
Infrastructure as Code, Theory Crash CourseDr. Sven Balnojan
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Creating testing tools to support development
Creating testing tools to support developmentCreating testing tools to support development
Creating testing tools to support developmentChema del Barco
 
Google App Engine for Java v0.0.2
Google App Engine for Java v0.0.2Google App Engine for Java v0.0.2
Google App Engine for Java v0.0.2Matthew McCullough
 
A la découverte des google/mock (aka gmock)
A la découverte des google/mock (aka gmock)A la découverte des google/mock (aka gmock)
A la découverte des google/mock (aka gmock)Thierry Gayet
 
Building An Automated Infrastructure
Building An Automated InfrastructureBuilding An Automated Infrastructure
Building An Automated Infrastructureelliando dias
 
Building Automated Infrastructures
Building Automated InfrastructuresBuilding Automated Infrastructures
Building Automated Infrastructureselliando dias
 
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)Domas Lasauskas
 
Angular Optimization Web Performance Meetup
Angular Optimization Web Performance MeetupAngular Optimization Web Performance Meetup
Angular Optimization Web Performance MeetupDavid Barreto
 
Google mock training
Google mock trainingGoogle mock training
Google mock trainingThierry Gayet
 
Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)Yan Cui
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go BadSteve Loughran
 
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attacDefcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attacPriyanka Aash
 
Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)Yan Cui
 
Ajax Performance
Ajax PerformanceAjax Performance
Ajax Performancekaven yan
 
Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work managerbhatnagar.gaurav83
 
Operationalizing Clojure Confidently
Operationalizing Clojure ConfidentlyOperationalizing Clojure Confidently
Operationalizing Clojure ConfidentlyPrasanna Gautam
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Brian Brazil
 

Similar to Velocity 2015: Building Self-Healing Systems (20)

PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the Trenches
 
Infrastructure as Code, Theory Crash Course
Infrastructure as Code, Theory Crash CourseInfrastructure as Code, Theory Crash Course
Infrastructure as Code, Theory Crash Course
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Creating testing tools to support development
Creating testing tools to support developmentCreating testing tools to support development
Creating testing tools to support development
 
Google App Engine for Java v0.0.2
Google App Engine for Java v0.0.2Google App Engine for Java v0.0.2
Google App Engine for Java v0.0.2
 
A la découverte des google/mock (aka gmock)
A la découverte des google/mock (aka gmock)A la découverte des google/mock (aka gmock)
A la découverte des google/mock (aka gmock)
 
Building An Automated Infrastructure
Building An Automated InfrastructureBuilding An Automated Infrastructure
Building An Automated Infrastructure
 
Building Automated Infrastructures
Building Automated InfrastructuresBuilding Automated Infrastructures
Building Automated Infrastructures
 
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
 
Angular Optimization Web Performance Meetup
Angular Optimization Web Performance MeetupAngular Optimization Web Performance Meetup
Angular Optimization Web Performance Meetup
 
Google mock training
Google mock trainingGoogle mock training
Google mock training
 
Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go Bad
 
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attacDefcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
 
Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)
 
Ajax Performance
Ajax PerformanceAjax Performance
Ajax Performance
 
Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work manager
 
Operationalizing Clojure Confidently
Operationalizing Clojure ConfidentlyOperationalizing Clojure Confidently
Operationalizing Clojure Confidently
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
 

More from SOASTA

DPM in Pictures
DPM in PicturesDPM in Pictures
DPM in PicturesSOASTA
 
Optimizing your marketing promotions to mazimize your revenue
Optimizing your marketing promotions to mazimize your revenueOptimizing your marketing promotions to mazimize your revenue
Optimizing your marketing promotions to mazimize your revenueSOASTA
 
Using JMeter in CloudTest for Continuous Testing
Using JMeter in CloudTest for Continuous TestingUsing JMeter in CloudTest for Continuous Testing
Using JMeter in CloudTest for Continuous TestingSOASTA
 
Webinar: Load Testing for Your Peak Season
Webinar: Load Testing for Your Peak SeasonWebinar: Load Testing for Your Peak Season
Webinar: Load Testing for Your Peak SeasonSOASTA
 
Velocity Booth Session - Better Together: RUM & Synthetic
Velocity Booth Session - Better Together: RUM & SyntheticVelocity Booth Session - Better Together: RUM & Synthetic
Velocity Booth Session - Better Together: RUM & SyntheticSOASTA
 
Velocity Booth Presentation - Which 3rd Party Resources Are Eating Your Profits?
Velocity Booth Presentation - Which 3rd Party Resources Are Eating Your Profits?Velocity Booth Presentation - Which 3rd Party Resources Are Eating Your Profits?
Velocity Booth Presentation - Which 3rd Party Resources Are Eating Your Profits?SOASTA
 
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...SOASTA
 
Velocity 15 Minute Booth Session - Building a Performance Team - Dave Murphy
Velocity 15 Minute Booth Session - Building a Performance Team - Dave MurphyVelocity 15 Minute Booth Session - Building a Performance Team - Dave Murphy
Velocity 15 Minute Booth Session - Building a Performance Team - Dave MurphySOASTA
 
Radial | SOASTA IR Webinar
Radial | SOASTA IR WebinarRadial | SOASTA IR Webinar
Radial | SOASTA IR WebinarSOASTA
 
IRCE 2016 Speaking Session – The Small Things That Add Up: How to Find What D...
IRCE 2016 Speaking Session – The Small Things That Add Up: How to Find What D...IRCE 2016 Speaking Session – The Small Things That Add Up: How to Find What D...
IRCE 2016 Speaking Session – The Small Things That Add Up: How to Find What D...SOASTA
 
Ann Ruckstuhl eTail West
Ann Ruckstuhl eTail WestAnn Ruckstuhl eTail West
Ann Ruckstuhl eTail WestSOASTA
 
Webinar: New Features in CloudTest & TouchTest
Webinar: New Features in CloudTest & TouchTestWebinar: New Features in CloudTest & TouchTest
Webinar: New Features in CloudTest & TouchTestSOASTA
 
5 Keys to Your Best Automated Testing Strategy
5 Keys to Your Best Automated Testing Strategy5 Keys to Your Best Automated Testing Strategy
5 Keys to Your Best Automated Testing StrategySOASTA
 
Soasta | CloudBees webinar 11/12/2015
Soasta | CloudBees webinar 11/12/2015Soasta | CloudBees webinar 11/12/2015
Soasta | CloudBees webinar 11/12/2015SOASTA
 
Rewriting The Revenue Rules: From Mobile-First To Mobile-Only Mobile Shopping...
Rewriting The Revenue Rules: From Mobile-First To Mobile-Only Mobile Shopping...Rewriting The Revenue Rules: From Mobile-First To Mobile-Only Mobile Shopping...
Rewriting The Revenue Rules: From Mobile-First To Mobile-Only Mobile Shopping...SOASTA
 
Forensic tools for in-depth performance investigations
Forensic tools for in-depth performance investigations Forensic tools for in-depth performance investigations
Forensic tools for in-depth performance investigations SOASTA
 
Webinar - Success Factors Behind Successful Flash Sales
Webinar - Success Factors Behind Successful Flash SalesWebinar - Success Factors Behind Successful Flash Sales
Webinar - Success Factors Behind Successful Flash SalesSOASTA
 
Continuous Testing
Continuous TestingContinuous Testing
Continuous TestingSOASTA
 
Final tips holiday readiness 2015 for slide share
Final tips holiday readiness 2015 for slide shareFinal tips holiday readiness 2015 for slide share
Final tips holiday readiness 2015 for slide shareSOASTA
 
Business Value of Performance - Ann Ruckstuhl CMO DOC
Business Value of Performance - Ann Ruckstuhl CMO DOCBusiness Value of Performance - Ann Ruckstuhl CMO DOC
Business Value of Performance - Ann Ruckstuhl CMO DOCSOASTA
 

More from SOASTA (20)

DPM in Pictures
DPM in PicturesDPM in Pictures
DPM in Pictures
 
Optimizing your marketing promotions to mazimize your revenue
Optimizing your marketing promotions to mazimize your revenueOptimizing your marketing promotions to mazimize your revenue
Optimizing your marketing promotions to mazimize your revenue
 
Using JMeter in CloudTest for Continuous Testing
Using JMeter in CloudTest for Continuous TestingUsing JMeter in CloudTest for Continuous Testing
Using JMeter in CloudTest for Continuous Testing
 
Webinar: Load Testing for Your Peak Season
Webinar: Load Testing for Your Peak SeasonWebinar: Load Testing for Your Peak Season
Webinar: Load Testing for Your Peak Season
 
Velocity Booth Session - Better Together: RUM & Synthetic
Velocity Booth Session - Better Together: RUM & SyntheticVelocity Booth Session - Better Together: RUM & Synthetic
Velocity Booth Session - Better Together: RUM & Synthetic
 
Velocity Booth Presentation - Which 3rd Party Resources Are Eating Your Profits?
Velocity Booth Presentation - Which 3rd Party Resources Are Eating Your Profits?Velocity Booth Presentation - Which 3rd Party Resources Are Eating Your Profits?
Velocity Booth Presentation - Which 3rd Party Resources Are Eating Your Profits?
 
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
 
Velocity 15 Minute Booth Session - Building a Performance Team - Dave Murphy
Velocity 15 Minute Booth Session - Building a Performance Team - Dave MurphyVelocity 15 Minute Booth Session - Building a Performance Team - Dave Murphy
Velocity 15 Minute Booth Session - Building a Performance Team - Dave Murphy
 
Radial | SOASTA IR Webinar
Radial | SOASTA IR WebinarRadial | SOASTA IR Webinar
Radial | SOASTA IR Webinar
 
IRCE 2016 Speaking Session – The Small Things That Add Up: How to Find What D...
IRCE 2016 Speaking Session – The Small Things That Add Up: How to Find What D...IRCE 2016 Speaking Session – The Small Things That Add Up: How to Find What D...
IRCE 2016 Speaking Session – The Small Things That Add Up: How to Find What D...
 
Ann Ruckstuhl eTail West
Ann Ruckstuhl eTail WestAnn Ruckstuhl eTail West
Ann Ruckstuhl eTail West
 
Webinar: New Features in CloudTest & TouchTest
Webinar: New Features in CloudTest & TouchTestWebinar: New Features in CloudTest & TouchTest
Webinar: New Features in CloudTest & TouchTest
 
5 Keys to Your Best Automated Testing Strategy
5 Keys to Your Best Automated Testing Strategy5 Keys to Your Best Automated Testing Strategy
5 Keys to Your Best Automated Testing Strategy
 
Soasta | CloudBees webinar 11/12/2015
Soasta | CloudBees webinar 11/12/2015Soasta | CloudBees webinar 11/12/2015
Soasta | CloudBees webinar 11/12/2015
 
Rewriting The Revenue Rules: From Mobile-First To Mobile-Only Mobile Shopping...
Rewriting The Revenue Rules: From Mobile-First To Mobile-Only Mobile Shopping...Rewriting The Revenue Rules: From Mobile-First To Mobile-Only Mobile Shopping...
Rewriting The Revenue Rules: From Mobile-First To Mobile-Only Mobile Shopping...
 
Forensic tools for in-depth performance investigations
Forensic tools for in-depth performance investigations Forensic tools for in-depth performance investigations
Forensic tools for in-depth performance investigations
 
Webinar - Success Factors Behind Successful Flash Sales
Webinar - Success Factors Behind Successful Flash SalesWebinar - Success Factors Behind Successful Flash Sales
Webinar - Success Factors Behind Successful Flash Sales
 
Continuous Testing
Continuous TestingContinuous Testing
Continuous Testing
 
Final tips holiday readiness 2015 for slide share
Final tips holiday readiness 2015 for slide shareFinal tips holiday readiness 2015 for slide share
Final tips holiday readiness 2015 for slide share
 
Business Value of Performance - Ann Ruckstuhl CMO DOC
Business Value of Performance - Ann Ruckstuhl CMO DOCBusiness Value of Performance - Ann Ruckstuhl CMO DOC
Business Value of Performance - Ann Ruckstuhl CMO DOC
 

Recently uploaded

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Velocity 2015: Building Self-Healing Systems

  • 2. Speaker Intro - Todd ● Director of Ops for ● Over 25 years in IT ● Experience with both academic and enterprise computing ● Favorite operating system is Tru64 ● Enjoys solving problems...but loves sleep more! @toddminnella tminnella@soasta.com
  • 3. Speaker Intro - Matt ● VP of Engineering for ● Started programming with Atari BASIC in elementary school ● Ops on the side :-) ● First Velocity presentation! @msolnit msolnit@soasta.com
  • 4. Who are you? :-) http://www.cliarthut.com/clip- arts/751/who-are-you-clip-art- 751173.jpg
  • 5. Agenda (1 of 2) Part One - Theory ● Distributed Systems Challenges ● Mitigating Failure Impact ● Benefits and Risks ● Testing Requirements ● Methodology
  • 6. Agenda (2 of 2) Part Two - Practice ● Description of Demo System ● Example #1 - Externally Triggered Full GC ● Example #2 - External System Restart ● Example #3 - System-initiated Support Case ● Tools Demonstrated ● Other Ideas for Automation
  • 8. What makes a distributed system? ● Multiple components ● Different servers ● Different regions (data center or geo) ● A component failure != service or app failure ● Requires systems thinking
  • 9. Challenges faced by dist. systems ● Complexity ● Uncontrollable elements ● Hard to see the whole picture ● Impossible for a single person to manage
  • 10. What can we do about it? Easy answer: Add people! But… easy != correct
  • 11. Better coping strategy Enable your systems to heal themselves ...which is why we are here!
  • 12. Benefits of Self-Healing ● Better uptime (at the component level) ● Higher service quality ● Rapid identification of repeating issues ● Improved Ops team morale and productivity
  • 13. Risk of Self-Healing Systems ● Worse uptime (at the component level) ● Lower service quality ● Maintenance complexities ● Degraded Ops team morale and productivity
  • 14. Risks
  • 15. So why take the risks? Implemented well, self- healing systems can make for happier customers!
  • 16. Failsafe Design Bibel, G. D. Train Wreck: The Forensics of Rail Disasters. Baltimore: Johns Hopkins UP, 2012. 69-70. Print.
  • 17. Methodology Identify the Problem Design the Solution Execute by Hand Automate the solution Watch and adjust PSHAW!
  • 19. Demo Application Java App Server Farm (n = 2) Amazon Linux EC2 Instance EC2 Elastic IP address Load Balanced via DNS (Dyn Traffic Director) Simple Web Application (HTTP/HTTPS)
  • 21. Real-life mPulse example Started reporting Java statistics to monitoring tool in 2013. When investigating outages, often found an exact correlation with large garbage collections (sound familiar?). Set up an alert to fire when heap usage went above 70% Everybody into the war room!
  • 23. Real-life mPulse example, cont’d Engineering looks for a possible memory leak. Eventually someone says, “Just force a GC!” Most of the time, this would fix it. JVM isn’t perfect, if we help it then the system remains stable. Occasionally this didn’t fix it, which would indicate an actual bug. Engineering fixes, deploy, repeat!
  • 24. “Intermittent gratification” 90% of the time, there was no need to gather everyone together. Real-life mPulse example, cont’d
  • 26. Identify the Problem 1. Java isn’t garbage-collecting efficiently. 2. Tuning the JVM is time-consuming and dangerous. 3. Forcing a collection works, but it requires waking someone up.
  • 27. Describe a Solution (1 of 2) Identify a metric for JVM Heap Use that is indicative of the problem: Java VM Old % Used Start monitoring/reporting this metric. Specify a threshold for action: Old % Used > 65%
  • 28. Describe a Solution (2 of 2) When the threshold is reached, take an action: Trigger a full garbage collection After the action, monitor for success: Old % Used < 65%
  • 29. Execute by Hand Trigger the condition that causes the problem (or be patient and let it happen). Once monitoring indicates high old % used, manually execute the full GC.
  • 30. Automate the Solution, Manually Trigger Write a script to check for Java old % used. Run the script via cron or similar mechanism. Report when old % used exceeds threshold. A DevOps human will trigger the full GC.
  • 31. Script Snippet JAVA_PID=`pgrep -f -u tomcat /usr/lib/jvm/jre/bin/java` RAW_JSTATS=`jstat -gcutil $JAVA_PID | grep -v "S0"` old_pcnt_used=`echo $RAW_JSTATS | cut -f4 -d" "` integer_old_pcnt_used=`echo $old_pcnt_used | awk '{ printf ("%1.0f", $1) }'` if [ $integer_old_pcnt_used -gt $oldpcnttrigger ]; then echo "Would trigger full GC here" fi https://github.com/SOASTA/velocity-2015-self-healing-systems
  • 33. Automate the Solution, Automate the Trigger Taking the script shown previously, combine the step that: Reports that old % used > 65% with the step that: Triggers the full GC
  • 35. Watch and adjust Set up the automated script to run in as many test environments as are available/applicable. Review the results (script log, metrics graphs). Does it work? Investigate any issues thoroughly. Potentially, install the script in a dry-run mode in production.
  • 36. Go Live! We recommend a gradual deployment. Deploy to a subset of production, then assess. Expand the subset, assess again. When all of production is live, enjoy more sleep!
  • 39. Real-life mPulse example What is a beacon? {"timestamp":1392256183739,"drop_code":"crumb:missing","http_method":"GET","http_version":"HTTP /1.1","http_referrer":"","headers":{"host":"localhost:8080","accept":"*/*"},"params":{"nt_dns_e nd":"1392147897985","nt_load_end":"1392147912182","nt_first_paint":"1392147900.964995","mem.use d":"131000000","nt_spdy":"0","nt_unload_end":"1392147898577","nt_dns_st":"1392147897985","nt_co n_st":"1392147897985","rt.bmr.conEn":"834.00000000006","rt.bmr.resEn":"2320.0000000001637","mem .total":"199000000","nt_nav_st":"1392147897985","nt_domcontloaded_end":"1392147901891","dom.sz" :"58549","rt.tstart":"1392147897985","rt.bmr.domSt":"419.0000000000964","nt_con_end":"139214789 7985","nt_domint":"1392147901585","nt_red_end":"0","dom.ln":"939","nt_unload_st":"1392147898574 ","t_done":"14201","nt_load_st":"1392147912129","t_page":"13638","rt.end":"1392147912186","nt_d omloading":"1392147898927","nt_res_end":"1392147898571","t_resp":"563","rt.bmr.domEn":"813.0000 000001019","rt.tt":"14201","nt_red_cnt":"0","if":"","nt_fet_st":"1392147897985","nt_res_st":"13 92147898548","nt_req_st":"1392147897995","nt_nav_type":"0","mob.ct":"0","dom.img":"16","nt_red_ st":"0","rt.ss":"1392147897985","config.timedout":"true","rt.bmr.resSt":"2312.0000000001255","r t.si":"3el0j57fms0885mi- n0uk6y","rt.sl":"1","rt.bmr.fetSt":"16.000000000076398","rt.bmr.conSt":"813.0000000001019","nt_ domcomp":"1392147912129","dom.script":"27","v":"0.9.1389663787","rt.bmr.reqSt":"834.00000000006 ","r":"","rt.bstart":"1392147906107","rt.obo":"0","rt.start":"navigation","nt_domcontloaded_st" :"1392147901585"}}
  • 40. Real-life mPulse example, cont’d Each server processes millions of these per day. Beacons are logged to disk, eventually compressed and uploaded to S3.
  • 41. Real-life mPulse example, cont’d Every so often, the background uploader thread stops working. (we don’t know why yet) When this happens, we get 10-12 hours before the disk fills up and the server dies.
  • 42. Real-life mPulse example, cont’d A simple re-start fixes it. SO... While developers are investigating, Ops is getting paged (and woken up) to re-start boxes.
  • 43. Ops says… We can do better!
  • 44. Identify the Problem (Demo App) ● Lack of activity indicates a failed thread ● While the issue goes unresolved, data is delayed (and the disk may fill)
  • 45. Describe a Solution ● A restart of the application solves the problem ● The application server needs to be removed from service prior to the restart ● The server hosting the application is an AWS instance, and a reboot is fast and effective
  • 46. Execute by Hand 1. Take the application out-of-service 2. Restart the application 3. Watch for Self-Check OK 4. Put the application back in-service
  • 47. Automate the Solution, Manually Trigger ● Log metrics go to AWS CloudWatch ● Lack of activity triggers an Alarm ● Alarm triggers a SNS notification ● Human being makes the DNS changes and restart the server.
  • 48. DEMO
  • 50. Automate the Solution, Automate the Trigger ● EC2 and DynECT both have APIs ● DNS changes and reboot can all be automated ● Todd can sleep!
  • 51. Automate the Solution, Automate the Trigger AWS Lambda Upload code to Amazon (Node.js) Attach it to a listener (SNS) No instance required!
  • 52. Automate the Solution, Automate the Trigger Lambda function listens on “logs are not being uploaded” notification. Uses Dyn REST API to disable the DNS record. Uses EC2 API to re-boot the instance.
  • 53. Automate the Solution, Automate the Trigger Lambda function listens on “all OK” notification. Uses Dyn REST API to re-enable the DNS record.
  • 54. var dynect = require('./dynect_api.js'); var AWS = require('aws-sdk'); exports.cloudwatch_alarm_sns_handler = function(event, context) { event.Records.forEach(function(record) { var alarm = JSON.parse(record.Sns.Message); // Extract the instance status. ALARM means it's down, OK means it's up. var instance_up = alarm.NewStateValue !== "ALARM"; // ... https://github.com/SOASTA/velocity-2015-self-healing-systems Node.js code snippet
  • 56. DEMO
  • 57. Watch and adjust ● Include Ops team on ALARM and SELFCHECKOK notifications ● Observe effects - use monitoring tools to assess availability
  • 59. Real-life mPulse example ● Customers configure raw beacon uploads to their own S3 buckets. ● Sometimes they break things (or AWS access key is changed, etc.) ● We log the error, but we don’t monitor it and don’t notify customers.
  • 60. Identify the Problem ● Another example: yser connecting to a site can’t authenticate successfully ● Assumption is that this is a limited access site
  • 61. DevOps says… Now, let’s help our customers succeed!
  • 62. Describe a Solution ● Notify the Customer Support team ● Provide Support with details so that they can proactively reach out
  • 63. Execute by Hand ● Examine the logs for the error ● Review the situation with Support ● Work with Support to handle a case end-to- end
  • 64. Automate the Solution, Manually Trigger ● Log metrics go to AWS CloudWatch ● Presence of error triggers an Alarm ● Alarm triggers a SNS notification ● Human being can then create a Zendesk case
  • 65. Automate the Solution, Automate the Trigger ● AWS Lambda listens on SNS notification ● Collects information from the notification ● Files a Zendesk case categorized to go to the correct team
  • 66. AWS Lambda Actions On Failed Login notification ● Create a Zendesk case with user details
  • 67. Watch and adjust ● Ops reviews logs ● Ops meets with Support to review case frequency and outcomes
  • 68. Testing Requirements ● Start small ● Develop (and verify) in stages ● Let run in production-like environment ● Verify behavior in “dry-run” mode
  • 69. Tools Demonstrated - AWS CloudWatch http://aws.amazon.com/cloudwatch/ EC2 http://aws.amazon.com/ec2/ Lambda http://aws.amazon.com/lambda/ Linux http://aws.amazon.com/amazon-linux-ami/
  • 70. Tools Demonstrated - Other Datadog https://www.datadoghq.com/product/ Dyn Traffic Director http://dyn.com/traffic-director/ Monitis http://www.monitis.com/ PagerDuty http://www.pagerduty.com ZenDesk https://www.zendesk.com
  • 71. See SOASTA at booth #801