29. BEST OF BREED PRODUCTS
Build trust
with every
incident
STATUSPAGE MISSION
30. Mobile App Unresponsive
Investigating- We’re currently experiencing an issue with users unable to log into
our mobile application. We’re actively looking into the issue and will have an
update in the next 30 minutes.
Uptime Showcase
BEST OF BREED PRODUCTSBEST OF BREED PRODUCTS
Wednesday, April 10
8:20
[Banc.ly Site Status] Investigating: Site
instability http://stspg.com/21ak
MESSAGES
31. Investigating- We’re currently experiencing an issue with users unable to log into
our mobile application. We’re actively looking into the issue and will have an
update in the next 30 minutes.
Uptime Showcase
BEST OF BREED PRODUCTSBEST OF BREED PRODUCTS
34. Escalations
Banc.ly backend weekday
Banc.ly backend weekend
0 On-call users in Banc.ly backend,if not acknowledgedm
5 Sarah Smith, if not acknowledgedm
10 Ryan Windows, if not acknowledgedm
m15 Evgeny Willows, if not acknowledged
20 Banc.ly MIMs, if not acknowledgedm
+ Add escalation
Routing rules + Add routing rule
for any received alert
route the alert to
Friday 18:00 - Monday 06:00
Banc.ly backend weekend
Banc.ly backend weekday
routing time is betweenAND
THEN
IF
route alerts toELSE
BEST OF BREED PRODUCTS
36. FilterTimeline Add entry
Am checking for possums in the Google tracts,
as they have infested us before.
Josie Michaels01:14 ·
The curse has not yet been lifted from the
Liam Hens00:34 ·
The defragulator is checked and is not the source
of the problem. Frag lines are flowing smoothly.
Mark Kane01:24 ·MK
We have now fully cleared out the Login blockage.
It seems that Google was full of possums again.
We reset our API tokens and drained all cisterns of
the pestilence but we will remain ever vigilant.
Josie Michaels01:16 ·
Mary Smith01:30 · Incident resolved ·
| We have now fully restored service to
all of our customers. We will continue to monitor
the login services to ensure no further issues.
Resolved
16:45 (UTC +8) · Statuspage updated · Mary Smith
The hydrospanner became stuck in the Google pipeline. Despite
heroic efforts to free said spanner this led to a a blockage 2 weeks
ago.
Leadup
The pressure due to this blockage grew until approximately 7pm 21
Feb 2019, when there was an overflow of possums in the Google
pipeline. Obviously, this led to an outage of Google logins.
Fault
Executive summary
Login with Google has been unavailable for over 15 mins now.
Google services are still running, so it seems to be something our
end.
Postmortem: Incident #10 - Banc.ly site down
for customers
Reports / Postmortems /
BEST OF BREED PRODUCTS
37. FilterTimeline Add entry
Am checking for possums in the Google tracts,
as they have infested us before.
Josie Michaels01:14 ·
The curse has not yet been lifted from the
Liam Hens00:34 ·
The defragulator is checked and is not the source
of the problem. Frag lines are flowing smoothly.
Mark Kane01:24 ·MK
We have now fully cleared out the Login blockage.
It seems that Google was full of possums again.
We reset our API tokens and drained all cisterns of
the pestilence but we will remain ever vigilant.
Josie Michaels01:16 ·
Mary Smith01:30 · Incident resolved ·
| We have now fully restored service to
all of our customers. We will continue to monitor
the login services to ensure no further issues.
Resolved
16:45 (UTC +8) · Statuspage updated · Mary Smith
The hydrospanner became stuck in the Google pipeline. Despite
heroic efforts to free said spanner this led to a a blockage 2 weeks
ago.
Leadup
The pressure due to this blockage grew until approximately 7pm 21
Feb 2019, when there was an overflow of possums in the Google
pipeline. Obviously, this led to an outage of Google logins.
Fault
Executive summary
Login with Google has been unavailable for over 15 mins now.
Google services are still running, so it seems to be something our
end.
Postmortem: Incident #10 - Banc.ly site down
for customers
Reports / Postmortems /
BEST OF BREED PRODUCTS
38. Banc.ly Backend
Software project
JH Label
Create subscription plans and
discount codes in Stripe
BBE-945
Add link to app usage (GA) in email
report
BBE-935
Force SSL on any page that contains
account info
BBE-1029 H
Add analytics to pricing page
BBE-939
IN PROGRESS 4
Apply a prorated discount to a user
when they move from a low to a
high priced tier
BBE-1021
Allow users to change between two
tiers at the same price
BBE-973
J
Add NPS feedback to email report
BBE-1004
Add NPS feedback to wallboard
BBE-961
J
Implement feedback collector
BBE-321
TO DO 29 DONE 3
Schedule weekly email report for
Monday mornings to all staff
BBE-732
Automate collection of feedback for
weekly email report
BBE-931 HJ
Install SSL certificate
HBBE-983
Board
Board
Add item
Settings
Give feedback
Dashboard
Projects
Issues
Add-on
Settings
Back to project
Queues
Banc.ly Infras…
Service desk project
All open
All unassigned
Assigned to me
+ Add queue
11
2
5
Sharon Tweed raised this request via Portal
View request in portal
Activity Show all
Sandbox environment for testing changes Status
Waiting for support
Oleg Jobbs
ASSIGNEE
Sharon Tweed
REPORTER
EC2 Linux
AWS PRODUCT
Created 8 May 2017 5:43 PM
Last updated 4 hours ago
Show more
Hi, I need this provisioned for testing my changes in staging.
Sharon Tweed 17, Dec 2017
DeleteEdit
Hey Sharon, you requested a t3.micro instance type. I would suggest a t3.small
type as it will be better suited for what you’re trying to do.
Oleg Jobbs 17, Dec 2017
DeleteEdit
Add internal note / Reply to customer
INFRA-123
CRITICAL
Incidents
Performing maintenance on our file sync systems for the entire
weekend
BEGINNING 4 OCT 2018 (01:30 PDT)
Maintenance TemplatesIncidentsOpen
Apps
Your page
Upcoming
Components
Subscribers
Incidents
View status page
Sear
SCHEDULED
CRITICAL
Performing maintenance on our file sync systems for the entire
weekend
10 MINS AGO (08:30 UTC)
Component group name - component name long lorem Short name
IN PROGRESS
2
Banc.ly
Public site
DEEPLY INTEGRATED
Escalations
Banc.ly backend weekend
0 On-call users in Banc.ly backend,if notm
5 Sarah Smith, if not ackm
10 Ryam
m15
20m
On-call
Integrations
Services
Members
Roles
Policies
Conferences
Activity stream
On-call
Routing rules
for any received alert
route the alert to
Friday 1
Banc.ly backend w
Banc.ly backend wee
routing time is betweenAND
THEN
IF
route alerts toELSE
/TeamsBanc.ly Backend
Software project
40. Service desks
Human resources
We can help with new employee
onboarding and general queries.
IT Support
We can help with any
regarding your comp
Welcome to the Banc.ly Help Cent
Find help and services
Status update
Mobile app users having trouble logging in View Status page
Status update
Mobile app users having trouble logging in View in Statuspage
DEEPLY INTEGRATED
41. Incident #10/Incidents
Banc.ly site down for customers - 500 errors on /deposit/v2 API
Mar 9, 2019 11:52 PM
Backend API Integration +4
Elapsed time: 4h 4m 38s
P2
Associated alerts Responders StakeholdersDetails
Team Banc.ly backend
Service bancly-backend-api
Description Banc.ly site is down for customers. We’re seeing a large number of 500 errors in the
CloudWatch logs due to errors on /deposit/v2 API.
> Rate limiting has prevented the flux capacitors from receiving stream notifications.
P2 - HighPriority
FilterTimeline Add entry
Jira issues
Create new issue Link existing issue
Join command center
Open
+ Assign role
Role User
Incident response roles
Incident commander
Communications officer
Josie Michaels
Helena Carter
It appears flux rate limits were set in error, so team is
testing a restore of the previous configuration.
Josie Michaels01:14 ·
The elevated error rate appears to be due to incorrect
rate limits on the flux capacitor stream.
Liam Chaudhury00:34 ·
Saturday 9 March 2019
Mary Smith23:54 · Stakeholders updated ·
|
We have identified a problem with the deposit API, and
are working to determine a fix.
Website down due to deposit API errors.
23:45 · Site reliability alerted
500 error threshold exceeded on /deposit/v2 API#1094
500 error threshold exceeded on /deposit/v2 API#1094
Mary Smith23:48 · Associated alert acked ·
Mary Smith23:52 · Incident opened ·
We have now reinstated the previous flux rate limit levels
to allow sufficient traffic through the nets. Levels of API
traffic are returning to normal despite some heavy errors
underlying services.
Josie Michaels01:16 ·
DEEPLY INTEGRATED
Opsgenie + JSW logo
42. P2 - HighPriority
Jira issues
Create new issue Link existing issue
+ Assign role
Role User
Incident response roles
Incident commander
Communications officer
Josie Michaels
Helena Carter
Create Cancel
Banc.ly backend What needs to be fixed?Add error handling to deposit API for invalid tuple length
Create
This fly in
DEEPLY INTEGRATED
Opsgenie + JSW logo
43. P2 - HighPriority
Jira issues
Create new issue Link existing issue
+ Assign role
Role User
Incident response roles
Incident commander
Communications officer
Josie Michaels
Helena Carter
Jira issues
BBE-1227 TO DOAdd error handling to deposit API for invalid tuple length
BBE-1228 TO DOFix alerting rules to notify devs when rate limits exceeded
Create new issue Link existing issue
DEEPLY INTEGRATED
OG +JSW logo
44. The hydrospanner became stuck in the Google pipeline. Despite heroic efforts to
free said spanner this led to a a blockage 2 weeks ago.
Leadup
The pressure due to this blockage grew until approximately 7pm 21 Feb 2019, when
there was an overflow of possums in the Google pipeline. Obviously, this led to an
Fault
Executive summary
Login with Google has been unavailable for over 15 mins now. Google services are
still running, so it seems to be something our end.
Banc.ly site down for customers - 500 errors on /
deposit/v2 API - postmortem report
Reports / Postmortems /
P2 - HighPriority
Jira issues
Create new issue Link existing issue
+ Assign role
Role User
Incident response roles
Incident commander
Communications officer
The hydrospanner became stuck in the Google pipeline. Despite heroic efforts to
free said spanner this led to a a blockage 2 weeks ago.
Leadup
The pressure due to this blockage grew until approximately 7pm 21 Feb 2019, when
there was an overflow of possums in the Google pipeline. Obviously, this led to an
outage of Google logins.
Fault
This outage was first detected by New Relic. Simo Nalakorn was then alerted and
acknowledged the alert at 7:21pm
Detection
Root causes
Thresholds were exceeded. We ultimately performed inadequate checks of this
pipeline.
Mitigation and resolution
Defragging the pipeline cleared the possums, allowing us to restart it. Login service
restored at 7:51pm.
Executive summary
Login with Google has been unavailable for over 15 mins now. Google services are
still running, so it seems to be something our end.
deposit/v2 API - postmortem report
TimelineDetails
Am checking for possums in the
they have infested us before.
Josie Michaels01:34 ·
The curse has not yet been lifted
am continuing to search.
Josie Michaels01:20 ·
The defragulator is checked and
of the problem. Frag lines are flow
Josie Michaels01:47 ·MK
We have now fully cleared out the
It seems that Google was full of p
reset our API tokens and drained
pestilence but we will remain eve
Josie Michaels01:45 ·
Mary01:50 · Incident resolved ·
16:45 (UTC +8) · Statuspage upd
| We have now fully rest
of our customers. We will continu
login services to ensure no furthe
Resolve
d
FilterAdd entry
Jira issues
BBE-1227 TO DOAdd error handling to deposit API for invalid tuple length
BBE-1228 TO DOFix alerting rules to notify devs when rate limits exceeded
Create new issue Link existing issue
DEEPLY INTEGRATED
OG _ JSW logo
45. Banc.lyBanc.ly
Opsgenie
Josie closed alert #5684 “ALARM: PROD - backend-api 5xx error threshold exceeded”
6:53PMAPP
Opsgenie
Josie added a note to incident #23: “Team has identified a problem in event stream error handling.”
6:53PMAPP
dana 6:49PM
Have we checked all of the servers yet?
scott 7:39PM
Yeah, it went down last week
xander 6:59PM
This isn’t the first time Kinesis has gone down, right?
xander 8:01PM
Ah
Do we know why?
The event stream data had some invalid records.We need to fix the error handling and alerting.
dana 8:02PM
josie 8:02PM
!
INC #23: Banc.ly site down for customers - 500 errors on /deposit/v2 APIjosie
josie (you)
#inc-23
inc-23#
DEEPLY INTEGRATED
46. INCIDENTS ALERT FATIGUE IMPACTING THE IT TEAM
ITOps Team
Businesssystems/
infrastructure
Monitoring
+
Detect Respond
49. HOW ATLASSIANS MONITOR THEIR ENTERPRISE DEPLOYMENTS
https://confluence.atlassian.com/enterprise/how-atlassians-monitor-their-enterprise-deployments-947849816.html
Find out how we use Data Center ourselves for:
• getsupport.atlassian.com
• jira.atlassian.com
To date, both instances track a total of 1.9 million tickets,
with a combined user base of around 4.8 million users.
54. Service Desk
Create a project
Start with a service desk and
build it the way you want.
Change template What’s in this?
Name
Open (recommended)
Access
Legal
Create
57. RuleTransitionDone statusIn-progress statusTo-do status
DiscardSave & closeReview a contract
Legal
START
TO DO
Start work
Create request
Publish DONEIN PROGRESS
START
TO DO
IN REVIEW
Start work
Create request
Review contract
Approve
DeclinedRequest more info
APPROVED
DECLINED
CANCELLED
ANY STATUS
IN PROGRESS
59. Bancly support
Legal
Add a statement of Work (SOW) to an Existing Agreement
Add additional vendor services for an existing agreement
Request a Non-Disclosure Agreement
Protect confidential information using Banc.ly’s form NDA
Review a contract
Request a legal review of a contract
What can we help you with?
Have a legal request? Raise a request here.
60. What can we help you with?
Review a contract
Request a legal review of a contract
Bancly support
Legal
Summary*
Contract value
Attachment
Send Cancel
What is the purpose of the contract?
Drag and drop files, paste screenshots, or browse
Browse
What is the purpose of the contract?
67. “[Jira Align] allows us to connect our
teams to strategy, and that has been
critical to our transformation.”
— Candace Kelly, AT&T Center of Excellence