Darrell Westbury, Director of Operational Analytics at Credit Suisse, presents on how the global bank collects five types of IT operations data, analyzes it and uses it to derive insights.
08448380779 Call Girls In Friends Colony Women Seeking Men
Operational Analytics at Credit Suisse from ThousandEyes Connect
1. Operational Analytics
“Data is the New Soil”
–David McCandless
October 16th 2015
Darrell Westbury
Director of Operational Analytics
2. About me….
October 16th 2015 2
§ I’m Darrell Westbury
§ I work in Global Technology Services for
Credit Suisse
§ Credit Suisse is a fortune 200, Swiss-
based Investment Bank, with about
~46,000 employees world wide
§ I’ve worked at CS for ~7 years
§ I’ve held various roles ranging from:
• Head of Storage Ops for the Americas
• Head of Capacity and Inventory Services
• Director of Operational Analytics (current)
3. “Data is the new Soil”
- David McCandless
October 16th 2015 3
4. What is Operational Analytics?
“Operations Analytics (OA) is an approach or method of
applying big data principles and data analytics to the IT
Operations realm”
“OA centers on discovering trends and patterns in high
volume, complex and noisy IT systems data and making
predictions that will help avoid impact to key services
where possible and ‘recover quickly *’ when issues do
occur”
(* reduced MTTR)
October 16th 2015 4
5. October 16th 2015 5
Machine
Data
Wire
Data
Agent
Data
Synthetic
Transactions
Human
Maintained
System and Application logs, System Events,
Performance and Capacity Metrics…
Simulated views of a customer’s experience while
interacting with a service
Asset Inventories, Lifecycle Status, Data classes,
Apps Names and the people who manage them,
etc.
Intercepted system calls and Application Method
Invocations
Network Packet Captures that have been pre-
decoded for ease of use
What Types of Data Does OA Target?
6. So, What Does OA Actually Do?
Phase I: Data Onboarding
October 16th 2015 6
§ Identify Golden Data Sources
§ Extract and Transform (sometimes)
§ Load & Ingest
§ Manage Data Quality
§ Reference Data, Maturity Scales
§ Accountable Data Owners
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Data Onboarding
EFFORT
§ Track Progress with Score cards
§ Remove any Blockers
§ Update Data Documentation
7. So, What Does OA Actually Do?
Phase II: Data Science
October 16th 2015 7
§ Identify trends & seasonal patterns in data
§ look for baselines & outliers using statistics and linear algebra techniques
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Data Onboarding Data Science
EFFORT
8. October 16th 2015 8
So, What Does OA Actually Do?
Phase III: Data Visualizations
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Data Onboarding Data Science Data Visualization
EFFORT
9. How are we using ThousandEyes?
October 16th 2015 9
1 Basic DNS Testing
Alert on internet domain name
resolution failures
2 Port Listener Health
Tickle a TCP port over the
internet to ensure it’s listening
and responding
3 Data Path Testing
Observe the end-to-end path
through the internet to a target
service and monitor for route
changes, packet loss, latency
& jitter
4 Page Load Testing
Ensure internet facing web
sites and services are
responding correctly and
consistently
5 Synthetic Transactions
Test authentication and site
navigation; collect performance
stats at the page object level
10. Detecting Internet Client Access Issues
! Thousand Eyes probes began reporting a 50-100% drop in authentication responses from a
public facing Web Service
! Evidence of issue was provided to Web Operations, who were unaware , as all of their
monitoring tools depicted what they believed to be a healthy and stable infrastructure (CPU,
Memory, Disk & Network I/O, Capacity, Performance, etc.)
October 16th 2015 10
11. Evidence of Improved Performance
! Implemented a synthetic transaction to compare the relative End User Experience of using
the infrastructure with and without acceleration.
! Collected concrete evidence of a ~33% service time / latency performance improvement
when accessing an accelerated URL
! Also able to demonstrate relative smoothing of service time inconsistency
October 16th 2015 11
12. Insight into a Production Incident
Pre-Incident, Paths Look OK
October 16th 2015 12
10:45 AM 11:00 AM
13. Insight into a Production Incident
Alert Received - BGP Route Disruption
October 16th 2015 13
10:45 AM 11:00 AM
14. Insight into a Production Incident
Internet Paths are Failing
October 16th 2015 14
10:45 AM 11:00 AM
15. Insight into a Production Incident
No BGP Routes – SPoF Detected
October 16th 2015 15
10:45 AM 11:00 AM
16. Insight into a Production Incident
Path Failover to DR Site Successful
October 16th 2015 16
10:45 AM 11:00 AM
17. Some Final Thoughts…
October 16th 2015 17
§ ThousandEyes lets us run multiple types of tests at various levels
of sophistication and granularity from all over the world (DNS Test,
TCP Port Tickle, Internet Path Test, Page Loads, Full Synthetic
Transactions)
§ We see exactly how our clients are experiencing our services
(Internet Path, BGP Route health, packet loss, jitter & latency)
§ We receive alerts on any significant variations from our
established baselines (Paths, Routes, Performance, Service
Quality, etc.)
§ We’re leveraging real quantifiable data to take the guesswork and
subjectivity off the table
§ We’re empowering important business decisions