Are you unhappy with the state of monitoring in your organization? Are you successfully automating “all the things” except your monitoring checks? Are you tired of looking at monitoring dashboards that hark from another era? Do you long to access your monitoring system via a REST API?
Paperless Post recently solved these problems by replacing Nagios with Sensu, a new and awesome free monitoring and metrics router that is designed with configuration management and cloud deployments in mind.
In my presentation we’ll take an in-depth look into why we chose Sensu and how we monitor our services and collect system metrics to send to Graphite. Subtopics will include how we planned for and executed the migration, mistakes we made along the way, how we knew when to scale and how we did it. I’ll also cover how we’re making our Sensu setup redundant and highly available, how we’re monitoring and collecting metrics about Sensu, and how we’ve integrated our internal tools with Sensu.
Broken up into three core sections, this presentation tries to help explain why you monitor software, platforms and your products. What you can look for, and how to best get that information out of your code and finally how Sensu can be used to achieve this in a scalable platform.
This is a presentation demonstrating how Sensu is used at Yelp to support dynamic infrastructure, and promote self-service monitoring among teams.
Video Part 1: https://vimeo.com/92770954
Video Part 2: https://vimeo.com/92838680
An Introduction to Sensu by Bethany Erskine Hakka Labs
In-depth look into why Paperless Post chose Sensu, and how they monitor their services and collect system metrics to send to Graphite. Subtopics will include how we planned for and executed the migration, mistakes they made along the way, how they knew when to scale and how they did scale.
Broken up into three core sections, this presentation tries to help explain why you monitor software, platforms and your products. What you can look for, and how to best get that information out of your code and finally how Sensu can be used to achieve this in a scalable platform.
This is a presentation demonstrating how Sensu is used at Yelp to support dynamic infrastructure, and promote self-service monitoring among teams.
Video Part 1: https://vimeo.com/92770954
Video Part 2: https://vimeo.com/92838680
An Introduction to Sensu by Bethany Erskine Hakka Labs
In-depth look into why Paperless Post chose Sensu, and how they monitor their services and collect system metrics to send to Graphite. Subtopics will include how we planned for and executed the migration, mistakes they made along the way, how they knew when to scale and how they did scale.
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More! Redis Labs
Running any
application in a multi-tenant environment poses its challenges. This talk is focused around how we at Rackspace run Redis
in a multi-tenant environment, ensuring security, performance, fault tolerance and high availability. This talk will cover: an
architecture deep dive of multi tenant Redis on the cloud, management of sentinels, monitoring and operations of a large
scale Redis deployment,introducing new Redis versions,scaling, security, some lessons learnt. The target audience for this
talk is anyone who is interested in the deployment/operational aspect of running Redis. This is relevant not only for those
who want to run Redis themselves, but also interested in how a Redis provider might be doing it for them.
Time to say goodbye to your Nagios based setup. Discover all the new cool tools out there to do some more efficient monitoring. A talk made at OSMC 2014.
https://www.youtube.com/watch?v=_BAWi9Zhmic
Beautiful Monitoring With Grafana and InfluxDBleesjensen
Query your data streams with the time series database InfluxDB and then visualize the results with stunning Grafana dashboards. Quick and easy to set up. Fully scalable to millions of metrics per second.
A presentation on our experience at Ingram Content Group with Grafana and MySQL. In an enterprise environment it is sometimes necessary to keep data in a traditional, general purpose SQL database such as MySQL or PostgreSQL. These slides explore the challenges and benefits of using Grafana with an SQL database in a large enterprise production setting.
Four pillars of DevOps - John Shaw - Agile Cambridge 2014johnfcshaw
Slides presented at Agile Cambridge 2014 http://agilecambridge.net/ac2014/sessions/index.php?session=57
Session Description:
The emerging practice of DevOps is a natural extension to established Agile methods. The choice of tooling to support the practices is important and will influence heavily how rapid, repeatable and reliable live deployments might be.
Three of the four pillars are concerned with automation through tooling but, arguably, the fourth pillar is more important than the other three together. The fourth pillar is at the heart of the Agile Manifesto: people.
The "Gold Rush" for DevOps is dominated by vendors and the push to sell their wares. But it is people who use the tools, people who define, develop and assure the software, and people who manage the services after they have gone live. One of the cornerstones of DevOps is breaking down the walls between development teams and operations; too much tool specialisation will lead to further separation and even the introduction of yet another silo.
This talk will cover four pillars to DevOps: Environments, Deployment, Testing and People. The insights brought together in this talk were gained under commercial engagements with government clients, on development of financial systems responsible for management of funding in the adult education sector.
Show an Open Source Project Some Love and Start Using Travis-CIJoel Byler
Lots of us are looking for an open source project to help with, but sometimes it is hard to find a way to contribute. I'd like to recommend that folks start to consider using Travis-CI and adding Travis-CI scripts to projects that don't already have them. Lets look at what it takes to build a project using Travis and the benefits that a project can take advantage of if they use the service.
This was originally presented at CodeMash v2.0.1.4 in Sandusky, Ohio on January 10, 2014
Presented at SCREENS 2013 in Toronto with Nick Van Weerdenburg
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
AngularJS is a hot, hot, hot topic. Building web and mobile apps in AngularJS is an ease but there is a learning curve. In this session, you’ll learn the ins and outs of AngularJS and leave the session knowing how to build killer AngularJS apps.
DevOps: Getting Started with Puppet on WindowsRob Reynolds
You keep hearing about DevOps and how awesome it is, if you have Linux. Well Windows can be awesome with DevOps, too. And it’s just going to keep getting better. If you are on Windows and you are even remotely interested in making things better, then you should come out and see what Puppet is all about and what it can do for your organization.
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More! Redis Labs
Running any
application in a multi-tenant environment poses its challenges. This talk is focused around how we at Rackspace run Redis
in a multi-tenant environment, ensuring security, performance, fault tolerance and high availability. This talk will cover: an
architecture deep dive of multi tenant Redis on the cloud, management of sentinels, monitoring and operations of a large
scale Redis deployment,introducing new Redis versions,scaling, security, some lessons learnt. The target audience for this
talk is anyone who is interested in the deployment/operational aspect of running Redis. This is relevant not only for those
who want to run Redis themselves, but also interested in how a Redis provider might be doing it for them.
Time to say goodbye to your Nagios based setup. Discover all the new cool tools out there to do some more efficient monitoring. A talk made at OSMC 2014.
https://www.youtube.com/watch?v=_BAWi9Zhmic
Beautiful Monitoring With Grafana and InfluxDBleesjensen
Query your data streams with the time series database InfluxDB and then visualize the results with stunning Grafana dashboards. Quick and easy to set up. Fully scalable to millions of metrics per second.
A presentation on our experience at Ingram Content Group with Grafana and MySQL. In an enterprise environment it is sometimes necessary to keep data in a traditional, general purpose SQL database such as MySQL or PostgreSQL. These slides explore the challenges and benefits of using Grafana with an SQL database in a large enterprise production setting.
Four pillars of DevOps - John Shaw - Agile Cambridge 2014johnfcshaw
Slides presented at Agile Cambridge 2014 http://agilecambridge.net/ac2014/sessions/index.php?session=57
Session Description:
The emerging practice of DevOps is a natural extension to established Agile methods. The choice of tooling to support the practices is important and will influence heavily how rapid, repeatable and reliable live deployments might be.
Three of the four pillars are concerned with automation through tooling but, arguably, the fourth pillar is more important than the other three together. The fourth pillar is at the heart of the Agile Manifesto: people.
The "Gold Rush" for DevOps is dominated by vendors and the push to sell their wares. But it is people who use the tools, people who define, develop and assure the software, and people who manage the services after they have gone live. One of the cornerstones of DevOps is breaking down the walls between development teams and operations; too much tool specialisation will lead to further separation and even the introduction of yet another silo.
This talk will cover four pillars to DevOps: Environments, Deployment, Testing and People. The insights brought together in this talk were gained under commercial engagements with government clients, on development of financial systems responsible for management of funding in the adult education sector.
Show an Open Source Project Some Love and Start Using Travis-CIJoel Byler
Lots of us are looking for an open source project to help with, but sometimes it is hard to find a way to contribute. I'd like to recommend that folks start to consider using Travis-CI and adding Travis-CI scripts to projects that don't already have them. Lets look at what it takes to build a project using Travis and the benefits that a project can take advantage of if they use the service.
This was originally presented at CodeMash v2.0.1.4 in Sandusky, Ohio on January 10, 2014
Presented at SCREENS 2013 in Toronto with Nick Van Weerdenburg
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
AngularJS is a hot, hot, hot topic. Building web and mobile apps in AngularJS is an ease but there is a learning curve. In this session, you’ll learn the ins and outs of AngularJS and leave the session knowing how to build killer AngularJS apps.
DevOps: Getting Started with Puppet on WindowsRob Reynolds
You keep hearing about DevOps and how awesome it is, if you have Linux. Well Windows can be awesome with DevOps, too. And it’s just going to keep getting better. If you are on Windows and you are even remotely interested in making things better, then you should come out and see what Puppet is all about and what it can do for your organization.
'State of Puppet', presented at Puppet Camp San Francisco 2013 by Nigel Kersten, CTO of Puppet Labs. Learn more about IT automation and configuration management at www.puppetlabs.com. Bonus: 25% off a Puppet Certification Exam! Use code PU2551959831 at http://bit.ly/Sv3tQa though the end of Sept.
Microservices and functional programmingMichael Neale
A talk I did recently on microservices and functional programming. Microservices are small, single purpose apps that are run as a service, which are usually composed together to provide the real app.
CartoSet is a new Open Source framework to develop great geospatial websites. Based on the experience by Vizzuality developing highly visual geospatial websites, like protectedplanet.net, CartoSet is a Ruby on Rails framework based on CartoDB. Despite the number of existing geoportal websites none of them allow the creation of highly customized interfaces. On the other hand libraries like geoRuby provide great foundation but still it takes too much time to develop nice websites. CartoSet is a FOSS framework in the middle, allowing great customization but providing an easy building block for agile developers.
Similar to Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu (20)
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu
1. SENSE AND
SENSU-BILITY
Painless Metrics And Monitoring
In The Cloud with Sensu
Bethany Erskine
Velocity NYC 2013
http://github.com/skymob/sensu-tutorial
Monday, October 14, 13
2. BEFORE I BEGIN...
IF YOU DID NOT SET UP SENSU-TUTORIAL
BEFORE THE CLASS:
1. grab a USB key
2. follow the instructions on the README
If you don’t have a computer, no sweat!
Monday, October 14, 13
15. TODAY at
PAPERLESS
Two Sensu environments (prod/testing)
~ 250 - 275 instances of sensu-client
4-6 Sensu-server instances
25k Metrics/Hour to Graphite
1 custom dashboard
1 custom CLI
Monday, October 14, 13
16. RESOURCES
All of our
✓virtualized.Sensu infrastructure is
We typically give a
✓box 1.5GB RAM and sensu-server
2 processors,
scaling up RAM for any box running
more than one Sensu service on it.
4GB
✓install RAM for a monolithic Sensu
(Rabbit, Redis, all Sensu
components on one)
Monday, October 14, 13
18. NEEDS MORE
SENSU
✓High load on Sensu server
Backed-up queues in RabbitMQ
✓
TIP: set up check to monitor the
✓RabbitMQ ready queue size, you'll
want an email when the queue
grows about 10K and stays there
Monday, October 14, 13
19. HOW TO SCALE
✓Add more sensu-server instances
No special configuration needed
✓
checks will be
✓robin fashion todistributed in roundthe sensu-servers
Monday, October 14, 13
20. GRAPHITE PAINS
symptoms: backed up queues in
✓RabbitMQ, spotty graphs
cluster couldn’t
with the
✓large amount of keep upwe were
metrics
now serving it via AMQP
Monday, October 14, 13
21. GRAPHITE PAINS
✓
Solution: stop collecting metrics
every 10 seconds (excessive!)
✓
moved staging metrics to staging
Graphite cluster
✓
Moved prod Graphite cluster to
SSD
Monday, October 14, 13
26. STEP 3: DEFINE
GLOBALS
✓CHECKS: must be actionable!
✓METRICS: go nuts
HANDLERS: EMAIL for everything
✓initially, added Pagerduty later.
Monday, October 14, 13
27. OUR GLOBALS
✓
CHECKS: disk usage, swap usage,
zombie processes, RO filesystems
✓
METRICS: vmstat, disk usage, cpu,
memory, interface and disk perf
✓
HANDLERS: Email, Campfire,
Pagerduty
Monday, October 14, 13
28. STEP 4: DEFINE
SPECIFICS
✓
For each server role, define
additional states to be checked and
alerted on:
✓Process Checks
✓System Checks
✓Service Checks
✓Service Metrics
Monday, October 14, 13
29. STEP 5: SET UP A
PLACE TO TEST
✓
Set up a permanent testing Sensu
stack using your CM tool of choice
✓
Monday, October 14, 13
we used sensu-chef cookbook
30. STEP 6: SET A
WORKFLOW
✓
Develop and document a workflow
for implementing, testing,
deploying and signing off on
checks
✓
You’ll get the best coverage if
anyone (developers or ops) can
easily add checks and metrics to
Sensu
Monday, October 14, 13
31. EXAMPLE
WORKFLOW
add new sensu_check
✓appropriate cookbook definitions to the
in Chef
deploy
✓Chef new check to staging env using
✓Pull Request with sample graphs or alerts
✓Code Review from colleague
✓Deploy to Prod
Monday, October 14, 13
33. STEP 7: EXECUTE
WORKFLOW
Starting with the low-hanging
✓(plugins that already existed infruit
sensu-community-plugins
repository), configure and deploy
each check in the worksheet to the
testing Sensu server
deploy sensu-client to a few select
✓machines
Monday, October 14, 13
34. STEP 8: WATCH
THE WATCHER
Set up some bare-minimum 3rd
✓party monitoring for the Sensu
servers
✓
We use Panopta’s agent to check
for aliveness, disk usage and CPU
usage.
Monday, October 14, 13
36. MONITOR THE
MONITOR
✓
Other ideas: have Testing Sensu
monitor Prod Sensu
✓
Sensu can collect metrics about
itself
Monday, October 14, 13
37. STEP 9: ROLLOUT
Deploy your
✓infrastructureProduction server
Roll out the client
✓the rest of the yourand checks to
prod
environments.
Monday, October 14, 13
38. STEP 10: TUNE
✓
Expect to need to tune
✓and alert occurrences. thresholds
Laissez le bon alertes roulent!
Monday, October 14, 13
42. LET’S PLAY WITH
SENSU
If you haven’t been able to get your
sandboxes up and running,
please pair with someone near you.
Monday, October 14, 13
43. SANDBOX GOALS
✓
Get familiar with Sensu
configuration
✓
✓Deploy a check
Trigger an alert on that check
✓
Give you something to take home
✓and hack on
Install a Handler
Monday, October 14, 13
44. OOPS
If you mess anything up:
vagrant halt; vagrant up
Worst case:
vagrant destroy; vagrant up
Monday, October 14, 13
46. SENSU
CONFIGURATION
Please open up a terminal
✓into both your sensu-serverand SSH
and
sensu-client VMs
✓sudo su ✓cd /etc/sensu
Monday, October 14, 13
47. SENSU
CONFIGURATION
✓/etc/sensu/config.json - config for
redis, rabbitmq, api and dashboard
✓/etc/sensu/conf.d/ - checks go here
✓/etc/sensu/conf.d/client.json client configuration, subscriptions
✓
/etc/sensu/{extensions|handlers|
mutators|plugins}
Monday, October 14, 13
49. CHECK YOUR
DASHBOARD
Open a web browser and
✓http://10.254.254.10:8080 go to
username:
✓secret admin / password:
Monday, October 14, 13
50. HANDLERS
✓
A HANDLER takes action on an
event using a pipe, TCP, UDP,
AMQP, or a set of other handlers
Examples: send an
send
✓event to Pagerduty,email,metrics to
send
Graphite
✓
Monday, October 14, 13
Default is “debug”
51. HANDLER
EXAMPLES
✓BASIC: send an email to ops@
ADVANCED: attempt to remediate
✓the alert (i.e. run a custom script
that spins up additional ec2
instances)
Monday, October 14, 13
52. HANDLERS
Let’s configure an EMAIL handler
✓to send a informative email for an
event.
✓
/etc/sensu/handlers/mailer.rb
plugin is installed for you, we just
need to configure and install it
Monday, October 14, 13
53. CONFIGURE THE
PLUGIN
ON SENSU SERVER:
vim /etc/sensu/conf.d/handlers/
mailer.json
{
"mailer": {
"mail_from": "sensu@you.com",
"mail_to": "you@yourdomain.com"
}
}
Monday, October 14, 13
59. CHECKS
Sensu-client runs CHECKS that
✓defined and scheduled either are
locally (standalone) or on the
sensu-server (subscription).
A CHECK sends a RESULT as
✓EVENT to a HANDLER - this an
applies to anything - service
checks, metrics, etc
Monday, October 14, 13
60. CHECK
EXECUTION
✓
Either scheduled by the server
(subscription) or scheduled by the
client (standalone)
Today we will configure a
✓subscription-based check on the
server that will run on our client
Monday, October 14, 13
61. LETS CONFIGURE
A CHECK
✓
Use check-procs.rb to make sure
at least one instance of cornbread
is running
Monday, October 14, 13
62. DETERMINE OUR
CHECK COMMAND
On your SENSU CLIENT:
/opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check-procs.rb -p
cornbread -W1
Monday, October 14, 13
63. INSTALL OUR
CHECK
✓On your SENSU SERVER:
vim /etc/sensu/conf.d/checks/
✓cornbread_process.json
Monday, October 14, 13
74. MY SENSU TIPS
install the RabbitMQ management
✓web interface and bookmark it (see
http://10.254.254.10:15672/#/ )
✓
lock your plugins’ gem
dependency versions
Monday, October 14, 13
75. TIPS TIPS TIPS
✓
have alternate ways to access your
Dashboard information
✓
we integrated our command-line
developer tools with Sensu API
✓
we also created our own Ops
dashboard that queries Sensu,
Graphite and our app for data
Monday, October 14, 13
77. HA SENSU
✓
Redundancy is easy (bring up
more sensu-servers)
✓
Making Redis and RabbitMQ HA
more challenging
✓
We’re still running one solitary
Redis and RabbitMQ but are OK
with this risk for now
Monday, October 14, 13
78. WHERE TO GO
FOR HELP
✓
✓IRC: #sensu - freenode
sensu-users mailing list
✓
http://docs.sensuapp.org
Monday, October 14, 13
I’m curious, How many Nagios users do we have here? Anyone here running Sensu in production?
How many people here are happy with their monitoring system?
And how many of you are here today because you think monitoring sucks?
So, the reason I’m here today is because can safely say that I LOVE monitoring. But it hasn’t always been this way....
In 2011 I joined Paperless Post as the second ever member of the Operations team. Our new small team faced many challenges, one of which was fixing our monitoring infrastructure’s sad state. We had one monolithic Nagios server with no version control and no configuration management. Every time we needed to add hosts, services, etc, we manually edited the files and restarted the Nagios server. Even though I had years of experience living with Nagios and could write configs in my sleep, I dreaded ever having to add new hosts or checks.
Our metrics collection setup was in even worse shape. Munin was deployed for a handful of servers, but was so awkward to work with it’d all but been abandoned. We were sending data directly from our Rails app to Graphite, but no server system metrics were making it there at all. This was no way to be. But I don’t want to spent all morning telling you how much Nagios sucks, let me tell you a little monitoring love story about Paperless Post and Sensu.
In the fall of 2012 we’d outgrown our old managed hosting service and found a new provider and were making preparations to move to a new datacenter. At this point our entire infrastructure with the exception of Nagios were managed by Chef, and our plan was to bring up the new datacenter infrastructure entirely using Chef. We saw an opportunity to start fresh and explored our options, and quickly fell in love with Sensu
Sensu is Ruby, which we know and love and Paperless. Although the Sensu components are written in Ruby, checks and plugins can be written in any language
There was already a fully-featured Chef recipe for Sensu - in fact, Sensu was designed with configuration management in mind
We saw an opportunity to get involved with a young project that we could potentially contribute back to. The sensu-community-plugins were my first real open-source contributions, and after nearly two years with it, I feel strongly enough about the project to keeps supporting it in any way I can, which is why I’m here today.
Because we were on a tight deadline to deploy Sensu, the prospect of re-using existing Nagios checks appealed to me, with the option of re-writing them in Ruby using the Sensu plugin libraries later on down the line
Metrics and Checks all handled by one system. We were fully sold on being able to gather metrics using the same client that ran our health checks, and were excited about the prospect of seeing our system-level metrics on the same system as our application-level metrics via Graphite/Graphiti.
Sensu had potential to scale easily, something we’d end up needing to call on later
Sensu is incredibly flexible tool. I’ve yet to come up with a device or situation that couldn’t somehow be handled by Sensu. It’s sometimes referred to as a monitoring “router”, which is a very accurate description. It can handle any input and pass it off to any other script, system, or handler that you want.
Sensu LOVES the cloud and deals beautifully with ephemeral machine environments. We simply added an API call to our devtools so that deleting a node is as simple as saying `pp sensu delete_client foo`. This command can also be run from Jenkins or even theoretically from a client node itself before shutting down.
We're able to silence entire environments at a time using one simple command: `pp sensu silence production` collects all production nodes from Chef and then silences them using the Sensu API.
most use sensu-plugin gem and are written in Ruby, but all languages are welcome
A little about our Sensu setup at Paperless Post.
We have two Sensu environments: production and testing. Production runs 3, sometimes 4 instances of sensu-server, and testing 1, sometimes 2. We do not have this elasticity automated, but I’ll touch a little later on when we know to scale out by adding another sensu-server to the cluster.
We’re pushing 25K metrics per hour through Sensu to our Graphite cluster using Sensu’s AMQP handler.
Overall our transition from Nagios to Sensu was incredibly smooth. But as we grew there were of course problems here and there...
Initially we’d deployed a single Sensu server to handle all of production, but it became obvious it was time to scale when we saw some of these symptoms: high load on sensu server and backed-up queues in RabbitMQ. We have a Sensu check set up to alert us if the RabbitMQ queue size grows over 10K messages and stays there for longer than five minutes.
How do you scale? it’s a simple as bootstrapping another Sensu-server. In our case, Chef role[sensu-server] (which brings up a box running just sensu-server - no API or Dashboard).
No other special configuration is needed, just use the same config as the rest of the environment, and checks will be distributed in a round-robin fashion to your sensu servers.
The only major pains we experienced with Sensu have been related to Graphite. We started seeing backed-up queues and spotty graphs in RabbitMQ.
Throwing more sensu-servers at the problem didn’t help in this case, and it turns out that our Graphite cluster just couldn’t keep up with the large amount of metrics we were now serving it via AMQP.
AMQP works, but in some ways isn’t ideal - in our case, AMQP bypasses carbon-relay and thus the replication schema, and sends every metric to every cluster node, which is overkill for a six-node cluster.
We experimented with writing our own consumer, but ended up with the following solution: we stopped collecting metrics every 10 seconds (which was overkill anyway), and moved our staging metrics off of the production Graphite cluster and onto their own staging Graphite cluster. We then moved the production Graphite cluster’s VMs on to SSDs. In fact, I spent most of last week writing scripts to migrate Whisper files off of a six-node VM ware Graphite cluster on a a 2-node dedicated hardware cluster w/ SSDs.
Now, I want to tell you our tried and true method for a successful and happy transition from Nagios, or your monitoring system of choice, to Sensu.
There is a lot of talk in the Ops community about Alert Fatigue, and moving to a new monitoring system is a golden opportunity to clear your slate, clean up your alerts and determine what your REALLY care about.
Also, because of differences in the way each monitoring system implements checks, it usually makes sense to just start from scratch rather than try to port existing check schemas over to a new system.
This is a great opportunity to stop sending emails for things that don't matter - do you really need an email every time your CPU is pegged? probably not.
Metrics and Monitoring planning spreadsheet is a tool we used to survey all of our servers and determine what needed to be gathered and monitored.
I’ve shared this document with you on my Github in the “sensu-tutorial” repository. This spreadsheet contains a column for ... Example:
DETERMINE YOUR BASELINE - For ‘base’ role we made a list of things we wanted to know about every single machine.
Our criteria for a CHECK is it must be actionable
IF it’s something we want to know but don’t necessary need to act on, make a METRIC
disk usage, swap usage, zombie processes, RO filesystems
for METRICS, we gather vmstat, disk usage, cpu, memory, interface and disk performance metrics on every machine.
HANDLERS, we chose email for everything initially, then added Pagerduty later for only the most critical, must-wake-up-at-3am type alerts. We have a dedicated room in Campfire for receiving Sensu alerts.
DEFINE SPECIFICS For each role (in our case, Chef roles, but could be any machine, device or server role), we gathered the following:
Process Checks (at least 4 Unicorn workers should be running but no more than 20)System Checks (anything beyond our baseline system checks - say maybe we want to check for RO mounts only on servers that actually mount something)Service Checks (database locks, database connections, HTTP response)
Service Metrics (haproxy bytes in/out)Other
SET UP A TESTING ENVIRONMENT: This will get you familiar with deploying and administrating Sensu,
I strongly recommend having a permanent place to test all of your Sensu checks and configuration changes using your CM tool of choice. It can be dual purpose and serve your staging environments, and is a good place to test things like Sensu package upgrades.
We set up a Testing sensu infrastructure in the old datacenter, deploying using sensu-chef cookbook, which we customized as needed
Develop a workflow for implementing, testing, deploying and signing off on checks.
You’ll get the best check coverage if anyone on your team (developers, ops) can easily add checks or metrics to Sensu.
Our workflow at Paperless Post: using Chef (which we’re deploying using our devtools with the help of Jenkins), we develop and deploy our checks to testing environment. We then do a pull-request, including any notes about how we tested or metrics sample graphs or outputs. We have a colleague do a quick code review and approve that pull request, then we deploy to prod.
now the fun part: START DEPLOYING CHECKS! Starting with the low-hanging fruit (checks that utilized plugins that already existed in sensu-community-plugins repository), started deploying each check that you defined in the worksheet to the testing sensu server.
If a suitable plugin didn’t already exist in sensu-community-plugins, we had two choices: 1) re-use a Nagios check or 2) write our own in Ruby or Bash.
Monitor your monitoring system! This should be self-explanatory. Set up some bare-minimum 3rd party monitoring for the Sensu servers themselves so you’ll know if the VM goes completely down (this has not yet happened to us!) or runs out of disk space.
We use Panopta’s agent-based monitor to check for aliveness, disk usage and CPU usage.
Other ideas: have your Testing sensu set up monitor Production sensu.
Sensu can collect metrics about itself so there’s no need for a 3rd party system there.
This step is simple: Deploy your now well-tested server infastructure using your now well-tested Configuration Management recipes. This should go smoothly because you’ve had plenty of practice rolling out and administering your testing setup as well as all of your checks.
First you’ll want to stand up the production Sensu server stack, then you’ll roll out sensu-client to the rest of your production servers or VMs.
Let the alerts roll in! You’ll likely need to tune thresholds, alert occurrences, etc once you have your checks running against actual production traffic.
Quick overview of the Sensu architecture and how it’s deployed on your VirtualBoxes. Sensu uses RabbitMQ for all communication between the client and the server. RabbitMQ and Redis are all running on your sensu-server VM, as well as the Dashboard (not pictured here), the API, and the Server. Redis is used to persist data for use by the API.
Sensu package contains all of it's dependencies in an "omnibus" installer, meaning it embeds everything it needs into /opt/sensu. This is great because you don’t need to worry about whether your system ruby is going to work with it, and you don’t even need to install system-wide ruby if you don’t need it.
BREAK HERE if needed :)
A little background on the Sandbox. I used Vagrant and Chef to bring up these boxes. The original Vagrantfile will be available online for you. I didn’t want to spend too much time showing you how to deploy Sensu with Chef because I didn’t want to give the impression that Chef is your only option for deploying Sensu. However, if you are already familiar with Chef, you can check my sensu-tutorial github to see (and use) the recipes used to build these boxes.
Today we’re going to do some hand-configuration, just for you to get familiar with how Checks and Handlers work, but in reality, you’d be using your configuration management system of choice to deploy all of these.
If you open config.json on both the sensu-server and sensu-client VMs, you’ll see they are exactly the same.
Let’s jump right in and trigger an alert! By default, a Keepalive warning alert will be raised if the server doesn’t hear from the client after 120 seconds, critical threshold is 180. This is tunable on a per-client basis.
A handler is what takes action on an event, basically how the alert reaches a human.
All events are displayed to the Dashboard, regardless of handler.
Handlers can be sent through pipe, tcp, udp, amqp, to a set of other handlers.
So let’s configure a handler to send an email notification out for an event. I went ahead and installed the `mailer.rb` plugin and gem deps for you. Make sure you are on the server for all of the following config steps.
Now let’s install the handler. Let’s use the ‘default’ handler config as a template, and copy it over to email.json
I’ve acutally already installed the `mail` gem dependency for you, which you can see by issuing the above command.
Now we need to set up a check to use the handler we just set up.
If you want to try this on your sensu sandbox, you’ll need to `yum install nc`, please don’t all try this right now :)
guest/guest
Put Nginx In front of sensu-dashboard
Sensu dashboard runs on port 8080 and requires authentication, neither of which are yet configurable. We resolved this minor annoyance by running Nginx in front of the dashboard, proxying to 8080 and injecting authentication headers into Sensu so we don’t need to log in when viewing Sensu on our VPN.
Making sensu-server redundant is easy - all you need to do is bring up more instances of sensu-server - but scaling out and making Redis and RabbitMQ highly available can be more challenging from an operational perspective. At Paperless, we are still running one solitary Redis instance for Sensu, but are comfortable with this because a) bootstrapping a new one with Chef would be trivial and b) the data it contains is not mission critical and could be easily re-generated and c) we’ve had zero performance or stability issues with it thus far.
Because RabbitMQ is a mission-critical piece of Sensu, we would like to, at some point, separate out Rabbit into a cluster with one disk node and one RAM node with HAProxy in front. However, I’ve never quite been able to get HAProxy tuned for Sensu’s liking. When and if I do, expect a blog post. If anyone here has experience running RabbitMQ clusters, I’d love to hear from you!