SlideShare a Scribd company logo

APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at Scale

LinkedIn serves traffic for its 467 million members from four data centers and multiple PoPs spread geographically around the world. Serving live traffic from from many places at the same time has taken us from a disaster recovery model to a disaster avoidance model where we can take an unhealthy data center or PoP out of rotation and redistribute its traffic to a healthy one within minutes, with virtually no visible impact to users. The geographical distribution of our infrastructure also allows us to optimize the end-user's experience by geo routing users to the best possible PoP and datacenter. This talk provide details on how LinkedIn shifts traffic between its PoPs and data centers to provide the best possible performance and availability for its members. We will also touch on the complexities of performance in APAC, how IPv6 is helping our members and how LinkedIn stress tests data centers verify its disaster recovery capabilities.

1 of 35
Download to read offline
Trafficshifting: Avoiding Disasters &
Improving Performance at Scale
Michael Kehoe
Staff Site Reliability Engineer
LinkedIn
2
Overview
• Problem Statement
• Solution – How LinkedIn trafficshift’s
• Datacenter shifting
• PoP steering
• Challenges of APAC region
• IPv4 vs IPv6
• Questions
$ whoami
3
Michael Kehoe
• Staff Site Reliability Engineer (SRE) @ LinkedIn
• Production-SRE team
• Funny accent = Australian + 3 years American
$ whatis SRE
4
Michael Kehoe
• Site Reliability Engineering
• Operations for the production application
environment
• Responsibilities include
• Architecture design
• Capacity planning
• Operations
• Tooling
• Responsibilities include DNS/ CDN management &
Traffic infrastructure
5
Terminology
• PoP - Where LinkedIn terminates incoming requests.
• Fabric – Datacenter with full LinkedIn production stack deployed
• Loadtest – Stress test of a Fabric – to simulate a disaster scenario
Disaster Recovery
6
Problem Statement
• Fail between Fabrics
• Performance of applications is degraded
• Validate disaster recovery (DR) scenario
• Expose bugs and suboptimal configurations via loadtest
• Planned maintenance
• Fail between PoP’s
• Mitigate impact of a 3rd party provider maintenance/ failure (e.g. transport links)
• Software/ Configuration Bugs

Recommended

Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInMichael Kehoe
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInMichael Kehoe
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsMichael Kehoe
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016Michael Kehoe
 
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...HostedbyConfluent
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know confluent
 
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...HostedbyConfluent
 

More Related Content

What's hot

Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environmentconfluent
 
Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricAlexander Dean
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)KafkaZone
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...HostedbyConfluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesAlexander Dean
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7Site24x7
 
Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Christian Horsdal
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanDataWorks Summit
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
 
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...HostedbyConfluent
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EnginePraveen Thirukonda
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaAlexander Dean
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology confluent
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logAlexander Dean
 

What's hot (20)

Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabric
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registries
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7
 
Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to Espresso
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation Engine
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified log
 

Viewers also liked

Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Michael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialPooja Tangi
 
CouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones AndroidCouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones AndroidRicardo Monagas Medina
 
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Issa Fattah
 
Recruitment Analytics workshop - Endouble Antwerp 6-3-2017
Recruitment Analytics workshop  - Endouble Antwerp 6-3-2017Recruitment Analytics workshop  - Endouble Antwerp 6-3-2017
Recruitment Analytics workshop - Endouble Antwerp 6-3-2017Endouble
 
High Performance Magnolia with Anycast Routing
High Performance Magnolia with Anycast RoutingHigh Performance Magnolia with Anycast Routing
High Performance Magnolia with Anycast Routingbkraft
 
Service Redundancy and Traffic Balancing Using Anycast
Service Redundancy and Traffic Balancing Using AnycastService Redundancy and Traffic Balancing Using Anycast
Service Redundancy and Traffic Balancing Using AnycastSean Jain Ellis
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errorsHimanshu
 
Routing for an Anycast CDN
Routing for an Anycast CDNRouting for an Anycast CDN
Routing for an Anycast CDNTom Paseka
 
Endouble Advertising Workshop
Endouble Advertising WorkshopEndouble Advertising Workshop
Endouble Advertising WorkshopEndouble
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the dayPooja Tangi
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Diego Pacheco
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineeringguest90cec6
 
Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Dmitri Zimine
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth modelHimanshu
 
Web performance optimization (WPO)
Web performance optimization (WPO)Web performance optimization (WPO)
Web performance optimization (WPO)Mariusz Kaczmarek
 
Load balancing in the SRE way
Load balancing in the SRE wayLoad balancing in the SRE way
Load balancing in the SRE wayShawn Zhu
 
Best Practices And Next Gen Formats: Supercharging Web Content Performance
Best Practices And Next Gen Formats: Supercharging Web Content PerformanceBest Practices And Next Gen Formats: Supercharging Web Content Performance
Best Practices And Next Gen Formats: Supercharging Web Content PerformanceG3 Communications
 

Viewers also liked (20)

Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potential
 
CouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones AndroidCouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones Android
 
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
 
Recruitment Analytics workshop - Endouble Antwerp 6-3-2017
Recruitment Analytics workshop  - Endouble Antwerp 6-3-2017Recruitment Analytics workshop  - Endouble Antwerp 6-3-2017
Recruitment Analytics workshop - Endouble Antwerp 6-3-2017
 
High Performance Magnolia with Anycast Routing
High Performance Magnolia with Anycast RoutingHigh Performance Magnolia with Anycast Routing
High Performance Magnolia with Anycast Routing
 
Service Redundancy and Traffic Balancing Using Anycast
Service Redundancy and Traffic Balancing Using AnycastService Redundancy and Traffic Balancing Using Anycast
Service Redundancy and Traffic Balancing Using Anycast
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errors
 
Routing for an Anycast CDN
Routing for an Anycast CDNRouting for an Anycast CDN
Routing for an Anycast CDN
 
Endouble Advertising Workshop
Endouble Advertising WorkshopEndouble Advertising Workshop
Endouble Advertising Workshop
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the day
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineering
 
Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth model
 
Web performance optimization (WPO)
Web performance optimization (WPO)Web performance optimization (WPO)
Web performance optimization (WPO)
 
Load balancing in the SRE way
Load balancing in the SRE wayLoad balancing in the SRE way
Load balancing in the SRE way
 
Best Practices And Next Gen Formats: Supercharging Web Content Performance
Best Practices And Next Gen Formats: Supercharging Web Content PerformanceBest Practices And Next Gen Formats: Supercharging Web Content Performance
Best Practices And Next Gen Formats: Supercharging Web Content Performance
 

Similar to APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at Scale

Trafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at ScaleTrafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at ScaleAPNIC
 
PLNOG 3: John Evans - Best Practices in Network Planning
PLNOG 3: John Evans - Best Practices in Network PlanningPLNOG 3: John Evans - Best Practices in Network Planning
PLNOG 3: John Evans - Best Practices in Network PlanningPROIDEA
 
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...wangbo626
 
Migration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYMigration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYAshnikbiz
 
Rzepnicki_thesis_presentation_2003(2) (1)
Rzepnicki_thesis_presentation_2003(2) (1)Rzepnicki_thesis_presentation_2003(2) (1)
Rzepnicki_thesis_presentation_2003(2) (1)Witold Rzepnicki
 
How can Big data accelerate CDN services ?
How can Big data accelerate CDN services ?How can Big data accelerate CDN services ?
How can Big data accelerate CDN services ?ANOOP KUMAR P
 
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudAPNIC
 
MongoDB .local London 2019: Migrating a Monolith to MongoDB Atlas – Auto Trad...
MongoDB .local London 2019: Migrating a Monolith to MongoDB Atlas – Auto Trad...MongoDB .local London 2019: Migrating a Monolith to MongoDB Atlas – Auto Trad...
MongoDB .local London 2019: Migrating a Monolith to MongoDB Atlas – Auto Trad...MongoDB
 
Hybrid Cloud Journey - Maximizing Private and Public Cloud
Hybrid Cloud Journey - Maximizing Private and Public CloudHybrid Cloud Journey - Maximizing Private and Public Cloud
Hybrid Cloud Journey - Maximizing Private and Public CloudRyan Lynn
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 
Motor vehicle emission checker danu-lap
Motor vehicle emission checker danu-lapMotor vehicle emission checker danu-lap
Motor vehicle emission checker danu-lapaidsdatahub
 
Freedom of Movement for redisconf19
Freedom of Movement for redisconf19Freedom of Movement for redisconf19
Freedom of Movement for redisconf19Richard Leddy
 
Transform Your Data Integration Platform From Informatica To ODI
Transform Your Data Integration Platform From Informatica To ODI Transform Your Data Integration Platform From Informatica To ODI
Transform Your Data Integration Platform From Informatica To ODI Jade Global
 
Improving Resource Utilization in Cloud using Application Placement Heuristics
Improving Resource Utilization in Cloud using Application Placement HeuristicsImproving Resource Utilization in Cloud using Application Placement Heuristics
Improving Resource Utilization in Cloud using Application Placement HeuristicsAtakanAral
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudRobert Sanders
 
Druid Optimizations for Scaling Customer Facing Analytics
Druid Optimizations for Scaling Customer Facing AnalyticsDruid Optimizations for Scaling Customer Facing Analytics
Druid Optimizations for Scaling Customer Facing AnalyticsAmir Youssefi
 
Azure Application Architecture Guide
Azure Application Architecture GuideAzure Application Architecture Guide
Azure Application Architecture GuideMasashi Narumoto
 
About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014Michal Harish
 

Similar to APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at Scale (20)

Trafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at ScaleTrafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at Scale
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 
PLNOG 3: John Evans - Best Practices in Network Planning
PLNOG 3: John Evans - Best Practices in Network PlanningPLNOG 3: John Evans - Best Practices in Network Planning
PLNOG 3: John Evans - Best Practices in Network Planning
 
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
 
Migration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYMigration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITY
 
Rzepnicki_thesis_presentation_2003(2) (1)
Rzepnicki_thesis_presentation_2003(2) (1)Rzepnicki_thesis_presentation_2003(2) (1)
Rzepnicki_thesis_presentation_2003(2) (1)
 
How can Big data accelerate CDN services ?
How can Big data accelerate CDN services ?How can Big data accelerate CDN services ?
How can Big data accelerate CDN services ?
 
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
 
MongoDB .local London 2019: Migrating a Monolith to MongoDB Atlas – Auto Trad...
MongoDB .local London 2019: Migrating a Monolith to MongoDB Atlas – Auto Trad...MongoDB .local London 2019: Migrating a Monolith to MongoDB Atlas – Auto Trad...
MongoDB .local London 2019: Migrating a Monolith to MongoDB Atlas – Auto Trad...
 
Hybrid Cloud Journey - Maximizing Private and Public Cloud
Hybrid Cloud Journey - Maximizing Private and Public CloudHybrid Cloud Journey - Maximizing Private and Public Cloud
Hybrid Cloud Journey - Maximizing Private and Public Cloud
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 
Motor vehicle emission checker danu-lap
Motor vehicle emission checker danu-lapMotor vehicle emission checker danu-lap
Motor vehicle emission checker danu-lap
 
Freedom of Movement for redisconf19
Freedom of Movement for redisconf19Freedom of Movement for redisconf19
Freedom of Movement for redisconf19
 
Transform Your Data Integration Platform From Informatica To ODI
Transform Your Data Integration Platform From Informatica To ODI Transform Your Data Integration Platform From Informatica To ODI
Transform Your Data Integration Platform From Informatica To ODI
 
Improving Resource Utilization in Cloud using Application Placement Heuristics
Improving Resource Utilization in Cloud using Application Placement HeuristicsImproving Resource Utilization in Cloud using Application Placement Heuristics
Improving Resource Utilization in Cloud using Application Placement Heuristics
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the Cloud
 
Druid Optimizations for Scaling Customer Facing Analytics
Druid Optimizations for Scaling Customer Facing AnalyticsDruid Optimizations for Scaling Customer Facing Analytics
Druid Optimizations for Scaling Customer Facing Analytics
 
Решения WANDL и NorthStar для операторов
Решения WANDL и NorthStar для операторовРешения WANDL и NorthStar для операторов
Решения WANDL и NorthStar для операторов
 
Azure Application Architecture Guide
Azure Application Architecture GuideAzure Application Architecture Guide
Azure Application Architecture Guide
 
About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014
 

More from Michael Kehoe

Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFMichael Kehoe
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 

More from Michael Kehoe (16)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 

Recently uploaded

Basic Concepts of Material Science for Electrical and Electronic Materials ...
Basic Concepts of Material Science for  Electrical and Electronic Materials  ...Basic Concepts of Material Science for  Electrical and Electronic Materials  ...
Basic Concepts of Material Science for Electrical and Electronic Materials ...PeopleFinder
 
CDE_Sustainability Performance_20240214.pdf
CDE_Sustainability Performance_20240214.pdfCDE_Sustainability Performance_20240214.pdf
CDE_Sustainability Performance_20240214.pdf8-koi
 
Architectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaArchitectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaIgnacio J. Palma, Arch PhD.
 
Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxnikshaikh786
 
Center Enamel is the leading fire water tanks manufacturer in China.docx
Center Enamel is the leading fire water tanks manufacturer in China.docxCenter Enamel is the leading fire water tanks manufacturer in China.docx
Center Enamel is the leading fire water tanks manufacturer in China.docxsjzzztc
 
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHIINTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHIKiranKandhro1
 
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
GDSC  Google Cloud Study jam Web Bootcamp - Day-4  Session 4GDSC  Google Cloud Study jam Web Bootcamp - Day-4  Session 4
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4SahithiGurlinka
 
Introduction to the telecom tower industry
Introduction to the telecom tower industryIntroduction to the telecom tower industry
Introduction to the telecom tower industryssuserf5bbfd
 
fat and edible oil processsing.ppt, refining
fat and edible oil processsing.ppt, refiningfat and edible oil processsing.ppt, refining
fat and edible oil processsing.ppt, refiningteddymebratie
 
Energy Efficient Social Housing for One Manchester
Energy Efficient Social Housing for One ManchesterEnergy Efficient Social Housing for One Manchester
Energy Efficient Social Housing for One Manchestermark alegbe
 
Model Approved Food/ sanitary Grade Flow Meter
Model Approved Food/ sanitary Grade Flow MeterModel Approved Food/ sanitary Grade Flow Meter
Model Approved Food/ sanitary Grade Flow MeterManasMicrosystems
 
Introduction and replication to DragonflyDB
Introduction and replication to DragonflyDBIntroduction and replication to DragonflyDB
Introduction and replication to DragonflyDBMarian Marinov
 
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh RajputBasic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh RajputGaurav Singh Rajput
 
Laser And its Application's - Engineering Physics
Laser And its Application's - Engineering PhysicsLaser And its Application's - Engineering Physics
Laser And its Application's - Engineering PhysicsPurva Nikam
 
Introduction to Machine Learning Unit-1 Notes for II-II Mechanical Engineerin...
Introduction to Machine Learning Unit-1 Notes for II-II Mechanical Engineerin...Introduction to Machine Learning Unit-1 Notes for II-II Mechanical Engineerin...
Introduction to Machine Learning Unit-1 Notes for II-II Mechanical Engineerin...C Sai Kiran
 
Deluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdfDeluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdfartpoa9
 
Chapter 1 - Drilling Fluid Functions GR.ppt
Chapter 1 - Drilling Fluid Functions GR.pptChapter 1 - Drilling Fluid Functions GR.ppt
Chapter 1 - Drilling Fluid Functions GR.pptzeidali3
 
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdfForged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdfVikasKumar11936
 
Microstructure of Hadfield Steels (Robert Hadfield)
Microstructure of Hadfield Steels (Robert Hadfield)Microstructure of Hadfield Steels (Robert Hadfield)
Microstructure of Hadfield Steels (Robert Hadfield)MANICKAVASAHAM G
 
Center Enamel is the leading bolted steel tanks manufacturer in China.docx
Center Enamel is the leading bolted steel tanks manufacturer in China.docxCenter Enamel is the leading bolted steel tanks manufacturer in China.docx
Center Enamel is the leading bolted steel tanks manufacturer in China.docxsjzzztc
 

Recently uploaded (20)

Basic Concepts of Material Science for Electrical and Electronic Materials ...
Basic Concepts of Material Science for  Electrical and Electronic Materials  ...Basic Concepts of Material Science for  Electrical and Electronic Materials  ...
Basic Concepts of Material Science for Electrical and Electronic Materials ...
 
CDE_Sustainability Performance_20240214.pdf
CDE_Sustainability Performance_20240214.pdfCDE_Sustainability Performance_20240214.pdf
CDE_Sustainability Performance_20240214.pdf
 
Architectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaArchitectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi Arabia
 
Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptx
 
Center Enamel is the leading fire water tanks manufacturer in China.docx
Center Enamel is the leading fire water tanks manufacturer in China.docxCenter Enamel is the leading fire water tanks manufacturer in China.docx
Center Enamel is the leading fire water tanks manufacturer in China.docx
 
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHIINTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
 
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
GDSC  Google Cloud Study jam Web Bootcamp - Day-4  Session 4GDSC  Google Cloud Study jam Web Bootcamp - Day-4  Session 4
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
 
Introduction to the telecom tower industry
Introduction to the telecom tower industryIntroduction to the telecom tower industry
Introduction to the telecom tower industry
 
fat and edible oil processsing.ppt, refining
fat and edible oil processsing.ppt, refiningfat and edible oil processsing.ppt, refining
fat and edible oil processsing.ppt, refining
 
Energy Efficient Social Housing for One Manchester
Energy Efficient Social Housing for One ManchesterEnergy Efficient Social Housing for One Manchester
Energy Efficient Social Housing for One Manchester
 
Model Approved Food/ sanitary Grade Flow Meter
Model Approved Food/ sanitary Grade Flow MeterModel Approved Food/ sanitary Grade Flow Meter
Model Approved Food/ sanitary Grade Flow Meter
 
Introduction and replication to DragonflyDB
Introduction and replication to DragonflyDBIntroduction and replication to DragonflyDB
Introduction and replication to DragonflyDB
 
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh RajputBasic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
 
Laser And its Application's - Engineering Physics
Laser And its Application's - Engineering PhysicsLaser And its Application's - Engineering Physics
Laser And its Application's - Engineering Physics
 
Introduction to Machine Learning Unit-1 Notes for II-II Mechanical Engineerin...
Introduction to Machine Learning Unit-1 Notes for II-II Mechanical Engineerin...Introduction to Machine Learning Unit-1 Notes for II-II Mechanical Engineerin...
Introduction to Machine Learning Unit-1 Notes for II-II Mechanical Engineerin...
 
Deluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdfDeluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdf
 
Chapter 1 - Drilling Fluid Functions GR.ppt
Chapter 1 - Drilling Fluid Functions GR.pptChapter 1 - Drilling Fluid Functions GR.ppt
Chapter 1 - Drilling Fluid Functions GR.ppt
 
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdfForged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
 
Microstructure of Hadfield Steels (Robert Hadfield)
Microstructure of Hadfield Steels (Robert Hadfield)Microstructure of Hadfield Steels (Robert Hadfield)
Microstructure of Hadfield Steels (Robert Hadfield)
 
Center Enamel is the leading bolted steel tanks manufacturer in China.docx
Center Enamel is the leading bolted steel tanks manufacturer in China.docxCenter Enamel is the leading bolted steel tanks manufacturer in China.docx
Center Enamel is the leading bolted steel tanks manufacturer in China.docx
 

APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at Scale

Editor's Notes

  1. Good morning, my name is Michael Kehoe and in this presentation I’m going to talk about how LinkedIn shifts traffic between it’s PoP’s and datacenters to avoid disaster and improve site performance at scale
  2. So this morning I want to talk about the problem that we’re trying to solve, particularly in the context of APAC which is extremely challenging for internet companies Then we’ll deep-dive into how LinkedIn solves these problems to improve our availability and site performance. Specifically we’ll look at: Datacenter shifting PoP steering We’ll look at some of the challenges of operating in the APAC region, briefly talk about IPv6 adoption and then I’ll take questions
  3. So who am I? I’m a Staff Site Reliability Engineer (commonly referred to as SRE) at LinkedIn. I am on a team called Production-SRE, our team charter includes: Developing applications to improve MTTD and MTTR Build tools for efficient site issue troubleshooting, issue detection & correlation Assist in restoring stability to services during site critical issues Yes I have a slightly strange accent, it’s Australian with three 3 years of American.
  4. Site Reliability Engineering A term coined by Ben Treynor from Google You may also find it being called Devops/ Appops or Production Engineering Skillset based of: Sysadmin Network Engineer Architect Troubleshooter Software Engineer Role consists of: Architecture design Capacity planning Application Operations – Keeping the site healthy Writing automation and tooling SRE role/ philosophy differs between companies. At LinkedIn, SRE’s are responsible for DNS/ CDN management and traffic infrastructure
  5. So before we deep-dive, let’s go over some terminology PoP – Where LinkedIn terminates incoming requests to it’s datacenters. Spread geographically across the world Fabric – Datacenter where the full LinkedIn application stack is deployed. LinkedIn has 3 datacenters in the US and one in Singapore Loadtest – Where we stress test a Fabric to simulate a disaster.
  6. What are the use-cases for shifting traffic for Disaster Recovery purposes? Fabric: Performance of applications is degraded Site may be slow or users get errors Validate disaster recovery Plan for disasters (natural/ infrastructure/ code) Expose code bugs and suboptimal configurations via loadtest When the application infrastructure is under stress, easier to expose sub optimal configuration/ code Planned maintenance Intrusive infrastructure maintenance that may cause impact PoP Transport provider maintenance More common in Asia given the large number of submarine cables we utilize Software bugs
  7. So let’s look at the performance side of the equation. How can shifting traffic improve performance: Fabric: Members use the closest datacenter to them Manage capacity of a datacenter PoP: Steering Users to the best possible PoP gives us significant performance advantage By measuring CDN availability/ performance using RUM (talk about RUM and how it works), we can speed-up page-load-time by 50%
  8. **** NOTE: Move to excel and remove values *** Average linkedin.com page load time for countries using US Data-centers (measured by Catchpoint – All Major Metro Nodes around the world)
  9. Average linkedin.com page load time for countries using APAC Data-centres (measured by Catchpoint – Top 10 APAC metro nodes).
  10. Delta between US and APAC performance. Average is 2.5s
  11. LinkedIn has done extensive research on the impact site-speed has on user-engagement. From this research we know that slow page load times affects engagement and transaction This in-turn affects our revenue. This is imporant!
  12. So what does LinkedIn’s traffic architecture look like DNS routes users to the ‘best’ PoP (more on that later) IPVS (IP Virtual Server, a Linux kernel module) announces Unicast and Anycast addresses for www.linkedin.com and terminates TCP connections ATS (Apache Traffic Server) terminates SSL sessions and proxies requests to datacenters Stickyrouting service (talk about in a minute) tells the PoP (specifically ATS) which datacenter/ fabric to send the request to ATS in the datacenter proxies requests to frontend services
  13. Let’s talk about stickyrouting and Fabric-Shifting
  14. We run an offline Hadoop job to calculate primary and secondary datacenters for users. Hadoop is a distributed computing mechanism that proceses large datasets We store this data in an in-house key-value store named Espresso Stickyrouting serves information over a RESTFul interface to our Edge-PoP’s
  15. At LinkedIn, we partition our traffic into various classes so we can control them independently Logged-in vs Logged-out CDN traffic Monitoring traffic Microsites Logged-in users get assigned to a bucket (an arbitrary partition) We then online/ offline buckets in a fabric to manipulate the distribution of traffic between fabrics
  16. Benefits: Serve the request as close to the user Capacity management - Ensure that data-centers aren’t overloaded Personal data routing – lowers cost to serve
  17. My team built ’TrafficShift’ app to help automate datacenter routing’ We’ve automated fail-outs of datacenters Also allows us to do automated load-testing of our datacenters
  18. You can see, LTX1 (Texas datacenter) is failed out
  19. Example of failing out of East Coast Datacenter Top graph – Online buckets Bottom graph – Distribution of traffic
  20. Automation to validate DR Tell the engine which datacenter to stress, how much traffic, and what time periods and it will execute for us Traffic engine watches our alerting system to ensure we do not negatively impact the member experience
  21. Let’s talk about how users connect to LinkedIn’s PoP’s
  22. LinkedIn’s PoP locations Note that PoP in India is red – means it’s offline – talk about that further later
  23. Sometimes need to fail out for 3rd party issues – remember the red dot on the PoP map. Steer users to the next-best PoP. In this case. India to Singapore Note the slow traffic tail-off in TMU1 – DNS TTL’s not being honored For Anycast traffic, we withdraw the prefix announcement For Unicast, Fail healthchecks that DNS providers use to check if we are serving from that site
  24. Remember that red dot before. Sometimes by pure necessity, we need to fail out of PoP’s to mitigate impact or potential impact. In this case, move India traffic from India PoP to Singapore This does have an impact on client connect times and also page-load times.
  25. UAE has 2 ASNs and GeoDNS routes both to India 5384 – That’’s ok 15802 – Not ok
  26. RUM DNS recognizes optimal PoPs for ASN 15802 Two better paths, Hong Kong and London/ Dublin
  27. Drop in connect time after the change
  28. IPv6 – performs up to 40% better We’ve grown from 3% IPv6 traffic in July 2014 to over 12% today