Navpreet Singh, Technical Resolution lead, shares how John Hancock monitors their applications.
See the video here: https://youtu.be/Vb2o_DoG1hU
Be sure to subscribe and follow New Relic at:
https://twitter.com/NewRelic
https://www.facebook.com/NewRelic
https://www.youtube.com/NewRelicInc
1. We operate as John Hancock in the United States, and Manulife in other parts of the world.
The John Hancock Monitoring Story:
Implementation OR Adaptation?
What does it take to succeed with New Relic?
September 2017
2. We operate as John Hancock in the United States, and Manulife in other parts of the world.
Navpreet Singh
Head of Technical Resolution at John Hancock
2
3. 3
Manulife & John Hancock
Source: http://www.manulife.com/Our-Story
A Global company
22 million customers,
35,000 employees, 70,000 agents,
thousands of distribution partners
Global Assets Under Management
and Administration exceeded
$1 trillion in the first quarter of 2017
4. 4
Technology Landscape @ John Hancock
150 Year-old
Business
Early IT
Adapter
Using mainframe
Mainframe
COBOL
Microfocus…
Serverless
Microservices
In Cloud
VB, PB, Progress, VFP…
Java, .Net, Ruby, Node, Angular, React, PHP…
Windows, Linux, Solaris, AIX…
SQL Server, Oracle, DB2, MySQL…
…And every version of these!
, cloud,
and everything in between
5. 5
Technology Landscape @John Hancock
600+ applications
developed both in-house
and with vendors
Hosted on
multiple models
Thousands of
IT/IS professionals
6. We operate as John Hancock in the United States, and Manulife in other parts of the world.
The Manulife/John Hancock Reality
Before New Relic
7. Disparate Monitoring Solutions
Many different
approaches to
monitor applications
No monitoring software
for many applications
Basic hardware
monitoring for
ops and vendors
But…
Applications talk to each other all the time!
Result: Large holes in end-to-end monitoring
8. 8
Example Scenarios
Web page loading slow
Batch process running slow
Don’t know CPU? or RAM?
or Disk? or SQL? or App? issue
Dev team can only access app logs;
Can’t capture CPU/RAM usage
Need server admin & DBA
Meet Service admin to capture CPU/RAM usage
Wait for assigned admins to respond
Takes hours to days just to obtain data
before troubleshooting
Performance Issues
9. 9
Example Scenarios
Web page errors
App layer / Business layer errors
SQL errors
Dev team uses app logs; limited insight
Need to bring to lower regions,
do code debugging
Time consuming exercise, lack of real time trace.
Web page -> App component -> SQL invoked
from App
Lack of detail @thread level tracing for
performance issues
Need architect / admins
Application Errors
10. 10
Increased Priority Incidents = Need for Better Monitoring
Move from
reactive to
proactive
We needed a
central monitoring standard
Resolve issues
quickly
Improve
understanding
of application
behavior
Improve visibility
into applications
in production
Enter
11. We operate as John Hancock in the United States, and Manulife in other parts of the world.
We’re All a Product of Our Environment!
What Else Was Happening When New Relic Was Being Introduced?
12. What Else Was Happening?
Move to Cloud
Predominantly Azure IaaS
with some PaaS, App Service
Some AWS
Move to Agile
Largely Scrum, SAFe with some
advanced concepts like TDD+Pairing
Push to DevOps
New Relic push aligns with
DevOps and Agile
13. CIO/COO sets a Clear Goal!
All applications in Production must be
monitored by New Relic within one year
An aggressive, clear, & unambiguous goal:
14. What’s Next?
What’s the right
Team Structure?
Who should Own
monitoring setup and
responsibilities?
16. 16
Monitoring Ownership options
A specialized central
monitoring team focused
on application monitoring
Ops team owns all
monitoring, drives it with
the application teams
1 2
Each app team
owns setting up
monitoring
3
17. 17
Our Ownership Solution at JH: It’s a Hybrid!
Each app team owns
setting up monitoring for
their applications
Center of Excellence
set up to drive the effort
Culture change – very important.
This distinguishes adaptation from a simple software implementation
For one BU with 100+ apps,
a central monitoring team
established within the BU
18. 18
Engagement Methodology with App Teams
1st set
of Meetings:
New Relic
Buy-in
2nd set
of Meetings:
App’s
Tech
Proposal:
App + New Relic =
Great Things!
Periodic
Check-ins
19. Adaptation: Best Practices & Suggestions
Culture Change
Get Buy-In
Highlight the Wins & Success Stories
to Top Leadership
Nurture an Internal Community
Monitoring Maturity Curve
Different types of monitoring
Alerts – Getting them right
Insights – IT Analytics
Insights – Business Analytics
20. 21
Agile mindset to the project
Bias towards action
Don’t sit in a room discussing / researching
until you know all the answers
Figure out enough to get started, start executing,
find answers in the process – Inspect and Adapt
21. 22
Progress Shared monthly
with all Senior IT Leaders
Metrics showed:
# of users
Growth over a period:
% Apps by Status
Monthly growth by BU
Metrics Highlighted to Track Progress
Agent Type
Min.
Contracted Apr May Jun
APM (Application
Performance Monitors) 264 61 98 126
Servers Unlimited 575 675 725
Mobile Apps 250000 0 0 298
Browser
(Million Checks) 75 1.5 8.3 11
Synthetic*
(Million checks) 1.5 1.4 1.4 0.7
Jan-17
Feb-17
Mar-17
Apr-17
May-17
Jun-17
‘In Progress’ and ‘Completed’JH DA
JH DA
22. We operate as John Hancock in the United States, and Manulife in other parts of the world.
Speed Bumps?
Before You Can Live Happily Ever After…
23. 24
Some speed bumps we faced?
Firewall – took a long time to resolve internally
SSL issue with older java apps
Sweet spot – Great with tech within the last 20-30 years and upcoming technologies
IBM technologies
PMI Metrics with Websphere
Private Locations Azure deployable image
Server Agents (& breadth)
24. We operate as John Hancock in the United States, and Manulife in other parts of the world.
Some Happy Endings…
25. 26
Results - Success Stories
APM: A group improved page performance by 3 secs per page load by
identifying tuning opportunities with a SQL executed multiple times for
every page load
Synthetics: A group identified a 100+ MB static file was being served by
webservers in MA instead of Akamai CDN
SQL Server Plugin: A team identified their Page Life Expectancy had
deteriorated drastically since DB moved to new server, indicating
inadequate RAM allocated
Insights: A team identified uneven load distribution across servers was
causing severely degraded performance
Server API+Synthetics: A team uses alerts on memory exhaustion to
avoid what used to be definite downtime
26. 28
Going Forward… The Journey Continues
Recently
Acquired
Infrastructure
Product
NR
Software
Analysis
Review
NR
Expert
Services
Increased
Insights
Retention
Period
Miles to go…
TR team responsibilities include:
Last escalation level for long-standing complex technical issues
Technical Innovation
Software Delivery Process Innovation
Technology Enablement – Drive & oversee initial adaptation phase of complex technical acquisitions that have division-wide impact
For Example, New Relic
TR team owns the initial phase New Relic implementation at John Hancock
Manulife operates as John Hancock in the United States, and Manulife elsewhere
One company. Two brands.
Financial advice, insurance and wealth and asset management solutions for individuals, groups and institutions around the world.
22 million customers, 35,000 employees, 70,000 agents, thousands of distribution partners
Global Assets Under Management and Administration exceeded $1 trillion in the first quarter of 2017
A History Tidbit:
Canada’s first Prime Minister, Sir John A. Macdonald, was also our Company’s first president
150 year old business
An early IT adapter
From Mainframes to Serverless Architecture in cloud – and everything in between – really, everything!
Mainframe, Microfocus
Client Server (VB, PB, Progress, VFP etc.)
Web technologies (Java, .Net, Ruby, Node.js, AngularJS, PHP etc.).
Windows, Linux, Solaris, AIX.
SQL Server, Oracle, DB2….
And almost every version of these!
600+ Applications, multiple environments in Prod and Non-Prod
Applications developed both in-house + many vendors.
Apps hosted with multiple models
On Premise in Data Centers
Increasing # in Cloud
ASP vendor managed
SaaS
Thousands of IT/IS professionals
Disparate Monitoring solutions
Many different software tools to monitor their applications
No monitoring software for many applications
Basic hardware monitoring for Ops and Vendors
Information access closely guarded to select few
Server, not the Application, monitoring predominant
But …..
Applications talk to each other all the time!
Result: Large holes in end-to-end monitoring
Whichever software made sense for their individual applications/tiers
Teams selected their own monitoring sw for their applications and tiers – based on their needs at the time
Increase in priority incidents highlighted the need to:
Move from reactive to proactive mode – prevent issues
Resolve issues quickly when they do occur
Improve understanding of application behavior across tiers & across applications
Improve visibility into how applications performed in Production
Requirement for central monitoring standard
End-to-end visibility
Application development team
Support team
Move to Cloud :
Big push to move applications to Cloud
Predominantly Azure IaaS with some PaaS, App Service
Some AWS
Move to Agile
Deliberate and definite paradigm shift towards Agile
Largely Scrum, SAFe with some advanced concepts like TDD+Pairing
Push to DevOps
New Relic push aligns with DevOps and Agile
Goal: end-to-end monitoring solution which spans tiers, hardware, and software
Servers - Ops team has clear ownership
Applications – Not so clear
Ownership at the Business Unit/team level
Application Owners/teams identified within each BU own one or more applications
Central Ops teams and Platform teams own hardware or common applications
Ops team owns all monitoring, drives it with the application teams
Some Pros; Many Cons:
Hardware/Server perspective to Monitoring
Knowledge/expertise about applications, tech stacks, etc
How will app teams react to Ops team driving this?
A specialized central monitoring team focused on application monitoring
Many Pros; Some significant Cons:
Most important: Acquire-Install-Hope Vs Adaptation
A behavior change is needed to truly utilize benefits of monitoring. Harder to drive the adaptation in this model
No Pain, No Gain: Apps team don’t go through the hard work, resulting in limited learning
Each app team owns setting up monitoring
Apt for Adaptation
Longer ramp with long term gains
Learning curve longer, more involved
Delayed gratification
Instills mindset into each team
Support structure needed for each app team
Program Management
Technical Guidance
Issue: Not easy to scale
Each app team owns setting up monitoring
Apt for Adaptation
Longer ramp with long term gains
Learning curve longer, more involved
Delayed gratification
Instills mindset into each team
Support structure needed for each app team
Program Management
Technical Guidance
Issue: Not easy to scale
The Rule: Each app team owns setting up monitoring for their applications
App team owns the “Do” e.g. installs, configurations, etc.
Center of Excellence set up to drive the effort
Run Program in a decentralized fashion
Technical Support
Owns convincing, guiding, training, resolving technical issues, building expertise in-house etc.
Culture change – very important. This distinguishes adaptation from a simple software implementation
The Exception: For one BU where the number of applications is very high
A central monitoring team established within the BU
This team works with the Center of Excellence to drive the monitoring adaptation
1st set of Meetings: New Relic Buy-in.
Understand the players & decision makers.
Demonstrate: Show an app similar to theirs already using New Relic
2nd set of Meetings: App’s Tech
App teams explain their tech stack, deployments, server layouts
Proposal: App + New Relic = Great Things!
Share a proposed monitoring Solution
How many of which agents to deploy, next steps, prioritization suggestions
Periodic check-ins
Ensure progress, adequate technical support & guidance through the monitoring maturity curve
Culture Change – Very important for adaptation rather than a simple software implementation
Get Buy-in, overcome resistance/reluctance
If you keep doing what you’ve been doing, you keep struggling with the same problems!
Highlight the Wins!
Take the horse to water…..
Monitoring Maturity Curve
Different types of monitoring - easy to involved.
Server/Infrastructure Synthetics APM, Browser, Plugins, Mobile
Alerts – Getting them right. Fine tuning process
Insights – IT Analytics
Insights – Business Analytics
We shared a monthly report with all Senior IT leaders
Metrics showed:
# of users
Growth over a period:
in APM hosts
Server Agents
Mobile Apps
Browsers
Synthetics
% Apps by Status
Monthly growth by BU
Firewall – took a long time to resolve internally
SSL issue with older java apps
Sweet spot – Great with tech within the last 20-30 years and upcoming technologies
IBM technologies
PMI Metrics with Websphere
Private Locations Azure deployable image
Server Agents
APM: A group improved page performance by 3 secs per page load by identifying tuning opportunities with a SQL executed multiple times for every page load
Synthetics: A group identified a 100+ MB static file was being served by webservers in MA instead of Akamai CDN
SQL Server Plugin: A team identified their Page Life Expectancy had deteriorated drastically since DB moved to new server, indicating inadequate RAM allocated
Insights: A team identified uneven load distribution across servers was causing severely degraded performance
Server API+Synthetics: A team uses alerts on memory exhaustion to avoid what used to be definite downtime
Recently acquired Infrastructure Product
Getting rave reviews internally
Need to replace Server Monitors
NR Expert Services
NR Software Analysis Review
Increased Insights Retention period
IT Analytics
Business Analytics
Miles to go….!