Feature: Ruby on Rails Application Monitoring with Cucumber
In order to ensure continuous application availability
A developer should be able to assert the behavior of production apps
From the outside in
Without using antiquated monitoring tools
To protect revenue
Continuous (Production) Integration: Ruby on Rails Application Monitoring with Cucumber
1. Feature: Ruby on Rails Application Monitoring with Cucumber
In order to ensure continuous application availability
A developer should be able to assert the behavior of production apps
From the outside in
Without using antiquated monitoring tools
To protect revenue
2. VP of Research & Development
railsmachine.com
jesse@railsmachine.com
@jnewland
github.com/jnewland
About me:
I get to hack on Ruby tools to manage large Rails deployments all day long. Not a bad job,
eh?
3. Before we get into monitoring or cucumber, let’s talk about testing.
In my career as a dev, my testing habits have evolved over time, largely inspired by available
tools.
I’m sure some of you have shared a similar journey - let’s take a quick look back.
4. No
more
clicking
around
Save in your editor / refresh in your browser / lather / rinse repeat.
Occasional human preformed quality assurance
Broken by design
5. I then made the jump to unit testing using Ruby’s Test::Unit - specifically the generated
Model and Controller tests Rails generated.
This was nice, but it was often devalued by stakeholders due to poor communication of the
business value of this work on my part.
6. R
Enter Rspec and the BDD movement.
Rspec helped me, and I’m sure a lot of others, associate the business value with writing
tests / specs.
Stakeholder-digestable code if you’re really good, stakeholder-digestable output if you’re
doing things right.
7. C
U
C
U
M
B
E
R
Basically, BDD nirvana. Stakeholder-*writable* if you’re crazy.
8. Cucumber lets software
development teams describe
how software should behave in
plain text. The text is written in
a business-readable domain-
specific language and serves
as documentation, automated
tests and development-aid - all
rolled into one format.
For those of you that aren’t familiar with Cuke
9. TATFT!
The most important part of the evolution of these tools is that they make it easy and -
legitimately - fun to test first and test all of the time as you’re developing your application.
10. Production
Monitoring
But what about production? We’re testing all the time in development, while we’re developing
the that’s going to create revenue. But in production...
11. Revenue
Preservation
...there’s actually revenue being earned. Why not test with the same veracity in production?
12. Current
Monitoring
Landscape
Quiz:
* Raise your hand if you are at least partially responsible for the continuous operation of a
business critical production rails app
* If you have ZERO monitoring of the site’s uptime - meaning your customers or boss would
be the one to tell you that the homepage was down - put your hand down
* If your monitoring solution runs on your server itself - monit or god, for example - put
your hand down
* If your external monitoring solution only hits one URL on the site, put your hand down
Some sites are monitored very closely, but I’ve found that in most cases, the monitoring of
many production apps is rather slim.
I generally evaluate monitoring solutions on two axes:
19. Bad things can happen when he’s not looking.
For example, in Rails apps, I see this happen all the time with...
20. Search is a part of many applications that I’ve seen go unmonitored. I’m not singling out
sphinx here - this is just a sweet picture - the same thing happens to Solr, etc
21. Search can fail when the rest of a site works fine due to many reasons:
* search daemon may go down
* the indicies may be corrupt
* or things may fail in a more interesting kind of way...
22. 0 results for “beer”
Wherein no results are returned when they obviously should be.
28. It’s the industry standard tool for infra monitoring. I haven’t met a single person that’s used
nagios that’s been an honest fan. The most widely despised part of nagios
29. is the noise. Unless masterly configured, Nagios is a noisy beast. This leads to “boy cries
wolf” type scenarios, wherein alerts are improperly categorized as noise and discarded.
30. EVIL
Because of the noise, and the piece of crap interface, esoteric configuration language, and for
years and years of waking me up for false positives, I’m going to paint this all in black and
white and just call nagios evil.
31. Pingdom’s a relatively new tool that’s gained a good bit of traction. It’s a hosted monitoring
service, that can test HTTP and many other types of services from a network of computers
around the world.
37. Business
Value
Disconnect
However, one thing that all of these tools are missing is a clear link between the business
value of the things they’re checking and the alerts they’re sending out
39. Cucumber lets software
development teams describe
how software should behave in
plain text. The text is written in
a business-readable domain-
specific language and serves
as documentation, automated
tests and development-aid - all
rolled into one format.
Cucumber’s served well for me in my experience in bringing stakeholders and developers
together.
40. Cucumber lets software
development teams describe
how software should behave in
plain text. The text is written in
a business-readable domain-
specific language and serves
as documentation, automated
tests and development-aid - all
rolled into one format.
But with a couple quick edits
41. Cucumber also lets operations
teams describe how
infrastructure should behave in
plain text. The text is written in a
business-readable domain-
specific language and serves
as documentation, monitoring
and deployment-aid - all rolled
into one format.
We have a tool that can help us bring together developers, operations, *and* stakeholders
42. #devops
Some of you following the twitterz may have noticed some people in the ops and
development space talking about the ‘devops movement’
43. devs
ops
working together
While calling this a movement is pretty wild - a hashtag does not a movement make - the
ideas surrounding this ‘movement’ are things that I believe in personally, and things we’re
working on everyday at Rails Machine - blurring the line between development and ops, and
the line between the infrastructure and the application.
44. Cucumber also lets #devops
teams describe how
applications should behave in
plain text. The text is written in a
business-readable domain-
specific language and serves
as documentation, monitoring
and deployment-aid - all rolled
into one format.
Using cucumber in production embodies everything that is devops, and can blur those lines
even more
48. Feature: slashdot.com
To keep the geek masses satisfied
Slashdot must be responsive
Scenario: Cached pages are super quick
Given I am benchmarking
When I go to http://slashdot.org/
Then the elapsed time should be less than 500 milliseconds
When I follow "Login"
Then the elapsed time should be less than 500 milliseconds
When I follow "Contact"
Then the elapsed time should be less than 500 milliseconds
50. Feature: Signup Emails
In order to prevent bots from taking over the site
A new user should receive a verification email upon signup
Scenario: New User signup
Given I visit "http://example.com"
And I follow "Signup!"
When I signup with a random email address and password
And I press "Go"
And I wait 10 seconds # an unfortunate reality
Then I should have one email in my inbox
And the email subject should match "^Welcome"
And the email body should match "http://example.com/v/w+"
https://github.com/technicalpickles/mailinator-spec
52. Feature: Response Time
As a impatient user
Our web server should be in tip-top shape
So our app can be super fast
Background:
Given my Scout account name is 'railsmachine'
And my Scout email and password are 'jesse@railsmachine.com' and 'sekret'
Scenario: Passenger Queue
When I get the metrics from the 'Passenger' plugin on 'example.com'
Then the 'passenger_queue_depth' should be 0
Scenatiro: CPU usage is low
When I get the metrics from the 'Server Overview' plugin on 'example.com'
Then 'cpu_last_minute' should be less than 1
http://github.com/jnewland/cucumber-scout/
53. Feature: Response Time
As a impatient user
Our app should be super fast
Background:
Given my NewRelic license key is 'omgwtfbbq'
Scenario: Average Response time
Given that my application is being monitored by New Relic
Then my application's 'response time' should be less than 500 milliseconds
Scenario: Apdex
Given that my application is being monitored by New Relic
Then my application's 'apdex' should be 1
http://github.com/jnewland/cucumber-newrelic
55. Feature: Cucumber wiki discoverability
In order to learn more about Cucumber
As an uninformed developer
I should be able easily find the GitHub wiki
Scenario: Searching for Cucumber on Google
When I go to http://www.google.com/
And I fill in "q" with "cucumber"
And I press "Google Search"
Then I should see "BDD that talks to domain experts first and code second"
57. Feature: example.org ssh logins
As a user of example.org
I need to login remotely
Scenario: Login with a key
Given I have the following public keys:
| keyfile |
| /home/jnewland/.ssh/id_dsa |
Then I can ssh to the following hosts with these credentials:
| hostname | username |
| example.org | jnewland |
| mail.example.org | jnewland |
Scenario: Checking /etc/passwd
When I ssh to "example.org" with the following credentials:
| username | password | keyfile |
| jnewland | | /home/jnewland/.ssh/id_dsa |
And I run "cat /etc/passwd"
Then I should see "jnewland" in the output
And I should not see "that_dude_we_just_fired" in the output
http://github.com/auxesis/cucumber-nagios
59. Feature: RAID
To ensure optimal server operation
And guarantee data is stored redundantly
The RAID array should be in a good state
Scenario: RAID Array status
When I check the raid array status
Then controller "1" should have a status of "optimal"
And controller "2" should have a status of "optimal"
And controller "1" should have "1" logical device with a status of "optimal"
And controller "1" should have "4" drives in "online" state
And controller "2" should have "1" logical device with a status of "optimal"
And controller "2" should have "4" drives in "online" state
http://github.com/auxesis/cucumber-nagios
61. Feature: rubygems.org
As a member of the Ruby community
I should be able to easily install Ruby gems
Scenario: DNS
When I lookup "rubygems.org"
Then the name should resolve an IP
http://github.com/auxesis/cucumber-nagios
67. Sorta Quick Setup Ge
ne
com rato
$ gem install cucumber-json cucumber-newrelic soo ing r
cucumber-scout cucumber-nagios n!
$ cd RAILS_ROOT
$ mkdir -p production_features/step_definitions
$ mkdir -p production_features/support
$ vi config/cucumber.yml
production: production_features -f Cucumber::Formatter::JSON --out tmp/cuke.json
$ vi production_features/support/env.rb
require 'cucumber/nagios/steps'
require 'cucumber/newrelic'
require 'cucumber/scout'
# etc
$ # hack on features
$ cucumber -p production # doesn’t load the Rails env, just the defined steps
$ # profit!