Automating Deployments with Bamboo and Ansible - Randall Thomson, Senior TechOps Engineer - LogicMonitor
LogicMonitor uses Atlassian Bamboo and Ansible to manage the deployment of applications throughout their microservice based infrastructure. The process integrates tightly with the LogicMonitor API to programmatically set SDTs and OpsNotes. Additional integration with HipChat sends automated room notifications. Randall Thomson will speak on how the LogicMonitor TechOps team uses Ansible and Bamboo to empower their Development team to safely and securely deploy applications in test and production environments.
AI Powered Full Stack Monitoring using Dynatrace - Himanshu Chhetri, CTO - Addteq
How do you effectively monitor the health of your Atlassian ecosystem and easily troubleshoot issues? DynaTrace, one of the recommended monitoring tools in Atlassian's enterprise documentation, is capable of automatically detecting performance issues in infrastructure, application and even provide insights into user experience across the globe. Himanshu Chhetri will present insights and real-world use cases using DynaTrace to monitor your mission-critical Atlassian tools.
2. Context
Automating Deployments with Bamboo and Ansible
• Bamboo - Atlassian Product for Building and
Deploying software
• Ansible - Open Source (RedHat) automation and
configuration management software
4. What is a Pod?
Automating Deployments with Bamboo and Ansible
All of the components required
to provide LogicMonitor for customers
Tomcat
Kafka
TSDB
MySQL
Relay
Global
Resources:
CloudProber
HAProxy
Redis
S3
SQS
ELBs
Sitemonitor
Proxy
SMTP
Render
ECSReporting
DNS
… what’s next?
ElasticSearch
Rserver
CloudBilling
Horizontally scalable Cell Architecture
5. Conflict
Automating Deployments with Bamboo and Ansible
• Scaling (infrastructure growth +
micro services)
• Consistent Process
• Resilient Process
6. Automating Deployments with Bamboo and Ansible
Empowering Developers (and others):
• Bamboo Deployments Plan (Scaling )
• Ansible Playbooks (Consistency )
• HipChat notifications (success/failure/verbose logging)
• Docker deploy agents (before it was a native Bamboo feature)
• LogicMonitor API (SDT + OpsNotes)
Overall Design (Resiliency )
Solution
7. Tasks: Ansible vs Bamboo
Automating Deployments with Bamboo and Ansible
Ansible Tasks Advantages:
• Can work with or without
Bamboo
• Playbooks can be managed in a
software repo
Bamboo Tasks Disadvantages:
• Challenging to SSH to multiple
hosts (not everyone is 100%
containers)
• Revision Control / audit trail is
not fully featured
8. • Automated
• Manual
Deployment Types
Automating Deployments with Bamboo and Ansible
Environment Types
• QA
• Stage
• Production
10. Timeline
Automating Deployments with Bamboo and Ansible
- 2007-2011/12: We would put the “bleeping” jar in the directory and run a perl
script to deploy (locally)
- 2012/2013: We introduced notion of war (also around the time we started
using Stash and Bamboo for builds)
- 2014/2015: We started deploying THINGs with bamboo, not production
- 2016/2017: Developers deploying all production apps across production
11. Takeaways
Automating Deployments with Bamboo and Ansible
OBJECTIVES: Automate, Automate, Automate
• Let the robots (and/or co-workers) do your job for you
• Minimize the number of methods needed for deployments
• Bamboo specs (deploy plans as code)
15. Monitoring Strategies
Alerting
Focus on customer-facing
performance problems rather than
individual metrics. Ideally reduce alert
fatigue.
Anomaly Detection
Automated baselines > Built-in static
thresholds > User-defined static
thresholds
Visibility
Infrastructure components that
comprise your Atlassian tool stack.
Eg: Application, Database, Load
Balancer.
Full Stack Monitoring
Infrastructure, Logs, Application
Performance Management,
Transaction Monitoring(Synthetic &
Real Users).
16. Host / Service APM Log
Monitoring
Full Stack
Monitoring Tools *
* By no means exhaustive but based on Addteq experience.
17. Monitoring Tools *
* By no means exhaustive but based on Addteq experience.
Host / Service
Built-in and
community plugins.
Based on user
defined static
thresholds. Eg: alert
when disk > 90% or
HTTP response
APM
Java agent needs to
be loaded into
application being
monitored. Features
including JVM,
Database and Web
transactions
18. Monitoring Tools *
* By no means exhaustive but based on Addteq experience.
Log
Monitoring
Analyze application
logs for errors,
problematic plugins,
service accounts
making lot of
requests etc
Full Stack
OS level agent to
monitor infra, APM,
logs and user
experience. AI based
automatic detection
of problems with
potential end user
impact.
19. After AI
Automatically detect performance &
availability issues. Pinpoint the potential
root cause. Automate the analysis of
volumes of monitoring data that can take
numerous hours when performed
manually by teams.
Before AI
Alert fatigue caused by too many or
missing alerts, constant tuning of user
defined thresholds. Too many metrics to
monitor constantly. Root cause analysis
time consuming and tedious.
20.
21.
22. A user reported that JIRA
issues in the project are
very slow to load.
23.
24.
25. Which JIRA projects or
Confluence spaces
contribute most to users
waiting for pages to be
loaded?
31. Strategy
Focus on end user impact
and experience monitoring
A.I.
Automatic problem
detection and root cause
detection can be very
powerful!
Full Stack
Use full stack monitoring tools
if available
Summary
32. Addteq blog with more examples : http://bit.ly/dynatlasblog
Dynatrace free trial : https://www.dynatrace.com/trial/
Addteq’s JIRA integration for Dynatrace : https://marketplace.atlassian.com/1219206
Hello. I’m Randall Thomson, Sr. TechOps Engineer at LogicMonitor. “LogicMonitor provides SaaS-based IT infrastructure performance monitoring for cloud, data center and on-premises environments. It provides native support for thousands of devices and instances and is integrated with a wide range of IT tools such as ServiceNow, Puppet and Atlassian.”
Our TechOps team manages the infrastructure which provides LogicMonitor service for our customers. We straddle the line between re-active and pro-active work to ensure uptime while managing near constant change and growth. This talk is about what our team has done to enable other teams to manage their own software deployments. I’m presenting our story because I'm proud of what we’ve built and know it would be useful for other people or organizations to implement for themselves. At the time we were trying to solve our problems there were not a lot of concrete results on Google. So here goes…
I tend to jump right into the nitty gritty so I want to spend a brief couple slides going over the two main subjects to talk: Bamboo and Ansible (I will keep referring to these two things)
So I just want to take a quick raise-your-hand poll: (ask audience)
Who has heard of or has experience with using Bamboo for software deployments?
Who uses Ansible?
And does anyone use them together?
OK great. Now I want to give some background on how we got to this point and why we use BOTH products.
A variety of things were going on at LogicMonitor when I joined 2.5 years ago:
We had a monolithic app evolving into a microservice architecture
Monolith —> microservice leads to more applications, coordination, complexity
Growth (more places to deploy the applications)
Time is precious, so we outsource tasks when we can
The Old Way didn’t scale.
Co-worker described it as “artisan crafted” server management
As an example, at one point we were SCP’ing a JAR around and using some Perl scripts executed on each host to manage deployments
One additional bit of context. We refer to the all of the components required to provide LogicMonitor as a Pod.
As we add more customers we must increase the number of Pods. If a new component is developed it must be applied to all Pods. All Pods must be the same so that we are not concerned with one-off problems.
So this leads us to a summary of our challenges.
Needed consistent and reliable deployments
Scaling with infrastructure growth as well as increase in microservices (ie: number of apps you need to deploy) (old processes were not scaling with either of these growth factors)
Deployment process should be consistent irrespective of environments (and if possible applications)
How do you provide a resilient deployment process while not relying upon your team to be human keyboards?
This brings us to our solution.
Enabled developers to click a button in order to deploy their components (to QA, staging and Production!)
Operations can enforce controls and processes - this is critical not only for maintaining sanity but it helps ensure compliance with security policies
HipChat notifications provide company-wide visibility of deployments - this is useful because….
This Docker image is intended to provide a self-container environment that can be used to deploy LogicMonitor applications
Dockerizing the ansible playbook meant we could control the ansible version, include any custom modules, etc…
We then integrate with our own LogicMonitor API so we can set scheduled downtimes for our services to prevent pages from being sent out. In addition we log a note about the new version of the app being deployed which provides meaningful context when diagnosing issues (good or bad!).
At this point you may be wondering why the added complication of Ansible instead of using native Bamboo tasks. I’m going to cover that next.
The Complication - Why use Ansible?
Bamboo was not the starting place for our deployments. Initially they were run locally via Ansible playbooks. Bamboo then started running the playbooks. We were familiar with Ansible and the notion of feeding an inventory file. We also moved to a dynamic inventory model - populated at runtime via consul queries - mention this up front in cool neat things. So instead of porting all of our Ansible tasks to Bamboo tasks we trigger Ansible via Bamboo. This also ensures that even someone with admin rights to a deploy plan cannot alter the code that’s doing the actual app upgrades.
We use both manual and automated Bamboo deployments. For our QA environments we typically auto-deploy based upon say a commit to develop branch. Staging environments may require a Jira ticket to be set to approved before a Lambda triggers the deployment. For most production environments the deploys are triggered manually by the component team lead. We have found this practice to work well. Developers are often best poised to understand the changes at play during a deployment. And they are often required to mitigate any issues that arise after the deployment.
So where are we at today? And were mistakes made?
All projects (even non-customer facing) are (should be :)) hosted in Bitbucket. We use Jira to enforce restrictions on branch commits, for example requiring two senior member approvals before a PR merge to master. We use Bamboo to both build and deploy software. Ansible governs the process required by each microservice app to facilitate a smooth upgrade. LogicMonitor ensures the app is providing the expected service to customers. The Ansible playbook also sets SDTs so nobody gets paged and OpsNotes so that we can correlate events along our monitoring graphs.
Let’s review our timeline and I’ll cover a few mistakes.
Mistake - copy/paste “bugs” - early on we had a production deployment plan using the develop branch release…
Here are some things I hope you can take home from this.
- Let the robots (and/or co-workers) do your job for you. Enable someone else. They probably don’t need to know how the app is deployed, but they will be better suited to understand if something does wrong with the application.
Minimize the number of methods needed for deployments - the more common ground between applications you can find the better. Using deploy environment vars is one example
Bamboo specs - it will help you further evolve better organization, enforce standards and help scale your deployment plans alongside infrastructure growth.
Questions? If time, Brief demo of Santaba QA deployment.