Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow

1,631 views

Published on

Geoff Wade, Senior Network Engineer, presents how ServiceNow relies on ThousandEyes Cloud Agents to provide insights into datacenter availability and reachability along with diagnostic data for ISP outages.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow

  1. 1. © 2016 ServiceNow All Rights ReservedConfidential © 2016 ServiceNow All Rights ReservedConfidential Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow Geoff Wade Sr. Network Engineer
  2. 2. © 2016 ServiceNow All Rights Reserved 2Confidential SECURITY TRANSPARENCYWORLDWIDE SCALABILITYCLOUD ARCHITECTURE Secure, Scalable and Always-On Enterprise Services
  3. 3. © 2016 ServiceNow All Rights Reserved 3Confidential Infrastructure and Monitoring – Datacenters • 2N infrastructure across all Datacenters – Independent duplicate architecture at a physically mirrored location for each physical Datacenter to ensure fault tolerance • All Datacenters are Tier 3+ sites – The infrastructure in these datacenters is concurrently maintainable and will result in zero downtime for any equipment failures located within the datacenter • Enterprise-class infrastructure and operations • Monitoring – Multiple internal monitoring solutions; none show customer view – ThousandEyes lets us monitor “The Internet” and isolate that as a fault; if it alerts with no other alerts, we know where to start – Some homebrew / lesser tools show limited views from outside – External monitoring of one or two data centers is easy, but…
  4. 4. © 2016 ServiceNow All Rights Reserved 4Confidential Worldwide Scalability
  5. 5. © 2016 ServiceNow All Rights Reserved 5Confidential Infrastructure – Network Architecture Because all of this… Site 1 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Site 2 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Core-C Core-D Core-C Core-D Border-BBorder-A Border Router-B Border Router-A RackServers RackServers DSR-A DSR-B TOR-A TOR-B DSR-A DSR-B TOR-A TOR-B Internal traffic External traffic
  6. 6. © 2016 ServiceNow All Rights Reserved 6Confidential Because all of this doesn’t matter if customers… Site 1 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Site 2 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Core-C Core-D Core-C Core-D Border-BBorder-A Border Router-B Border Router-A RackServers RackServers DSR-A DSR-B TOR-A TOR-B DSR-A DSR-B TOR-A TOR-B Internal traffic External traffic
  7. 7. © 2016 ServiceNow All Rights Reserved 7Confidential Because all of this doesn’t matter if customers can’t reach you! Site 1 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Site 2 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Core-C Core-D Core-C Core-D Border-BBorder-A Border Router-B Border Router-A RackServers RackServers DSR-A DSR-B TOR-A TOR-B DSR-A DSR-B TOR-A TOR-B Internal traffic External traffic
  8. 8. © 2016 ServiceNow All Rights Reserved 8Confidential Because all of this doesn’t matter if customers can’t reach you! Knowing when some customers can’t reach you is crucial for fixing things before it becomes a major crisis Site 1 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Site 2 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Core-C Core-D Core-C Core-D Border-BBorder-A Border Router-B Border Router-A RackServers RackServers DSR-A DSR-B TOR-A TOR-B DSR-A DSR-B TOR-A TOR-B Internal traffic External traffic
  9. 9. © 2016 ServiceNow All Rights Reserved 9Confidential
  10. 10. © 2016 ServiceNow All Rights Reserved 10Confidential Site 1 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Site 2 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Core-C Core-D Core-C Core-D Border-BBorder-A Border Router-B Border Router-A RackServers RackServers DSR-A DSR-B TOR-A TOR-B DSR-A DSR-B TOR-A TOR-B Internal traffic External traffic
  11. 11. © 2016 ServiceNow All Rights Reserved 11Confidential Site 1 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Site 2 Internet Pod 1 Core-A Core-B DC Interconnect-A Customer VPN-A Load Balancer-A Firewall-A DC Interconnect-B Customer VPN-B Load Balancer-B Firewall-B Core-C Core-D Core-C Core-D Border-BBorder-A Border Router-B Border Router-A RackServers RackServers DSR-A DSR-B TOR-A TOR-B DSR-A DSR-B TOR-A TOR-B Internal traffic External traffic
  12. 12. © 2016 ServiceNow All Rights Reserved 12Confidential How We Use ThousandEyes – SRE (Operations) Dashboard • Reducing SRE Signal:Noise Ratio – SRE has to process *lots* of input – Logging into multiple screens in a given tool doesn’t work – ThousandEyes provides a single page dashboard to quickly show immediate alerts – The same page also shows some “normal” statistics, useful in case correlation with another problem is needed – On display full-time in SOC – Can easily drill down if something shows red – Configurable in case more/fewer tests are needed
  13. 13. © 2016 ServiceNow All Rights Reserved 13Confidential How We Use ThousandEyes – Active Alerts via E-Mail • E-mail alerts – Back-up to web GUI – Can show a single issue on the Internet or… – Can be compiled to show a trend
  14. 14. © 2016 ServiceNow All Rights Reserved 14Confidential How We Use ThousandEyes – Other Views for Other Groups • Other groups – controlled via log-in – can have their own specific tests. – ICMP reachability to our ISP interfaces – TCP (image load) from ADCs – Specific page loading with more advanced tests – MX monitoring (soon) – BGP prefix testing & BGP route visualization
  15. 15. © 2016 ServiceNow All Rights Reserved 15Confidential How We Use ThousandEyes – BGP Route Visualization . • Used by network engineers • Your own POV is easy to track, but… • TE = historical POV of traffic into your ASN
  16. 16. © 2016 ServiceNow All Rights Reserved 16Confidential How We Use ThousandEyes – BGP Route Visualization . • Replay feature over a selected timeline – Diagnostics during an event – Forensics immediately after the event – Makes RFO explanations easy
  17. 17. © 2016 ServiceNow All Rights Reserved 17Confidential How We Use ThousandEyes – BGP Route Visualization . • Replay feature over a selected timeline – Diagnostics during an event – Forensics immediately after the event – Makes RFO explanations easy
  18. 18. © 2016 ServiceNow All Rights Reserved 18Confidential How We Use ThousandEyes – One-hop view . Sometimes the view from one hop away doesn’t clearly show where the problem is
  19. 19. © 2016 ServiceNow All Rights Reserved 19Confidential How We Use ThousandEyes – BGP Route Visualization . • Number of hops (away from you) can be expanded • Sometimes the problem isn’t with your connection, but upstream • We like that ThousandEyes nodes are often publicly-accessible route servers
  20. 20. © 2016 ServiceNow All Rights Reserved 20Confidential How We Use ThousandEyes – BGP Route Visualization . • Number of hops (away from you) can be expanded • …and then replayed again (same time period as seen before)
  21. 21. © 2016 ServiceNow All Rights Reserved 21Confidential How We Use ThousandEyes – BGP Route Visualization . • Focus: Only the nodes that saw an issue… • …as well as only the links that saw the issue.
  22. 22. © 2016 ServiceNow All Rights Reserved 22Confidential How We Use ThousandEyes – BGP Route Visualization . • Provides link to other tests that might be related
  23. 23. © 2016 ServiceNow All Rights Reserved 23Confidential How We Use ThousandEyes – BGP Route Visualization . • Can help isolate a failure, e.g., if failure is seen with one destination prefix but not the other
  24. 24. © 2016 ServiceNow All Rights Reserved 24Confidential How We Use ThousandEyes – Summary; Future; Considerations • Summary – Provides excellent view into your network – Used by many groups (but especially network engineers!) – Allows specific monitoring of what we need without extra noise – Helps us isolate and determine the urgency of an event during the event (and reduce MTR) – Helps us figure out what happened after the event; very useful for customer RFOs • Future – Internet Outage Detection – Metrics for downtime, etc. • Considerations – Internal ThousandEyes nodes
  25. 25. © 2016 ServiceNow All Rights Reserved 25Confidential Thank you Geoff Wade Sr. Network Engineer Geoff.Wade@servicenow.com servicenow.com

×