[@IndeedEng] Boxcar: A self-balancing distributed services protocol
Upcoming SlideShare
Loading in...5
×
 

[@IndeedEng] Boxcar: A self-balancing distributed services protocol

on

  • 4,126 views

Video available at: http://www.youtube.com/watch?v=E1ok08TVxDw ...

Video available at: http://www.youtube.com/watch?v=E1ok08TVxDw

Indeed's flagship job search product has evolved over the years to meet new challenges. It began as a single, monolithic web application. This grew larger and increasingly complex as we built new features. To remedy this growing problem, we implemented a service-oriented architecture to improve system availability, scalability, and maintainability. We examined common practices for service-oriented architectures, and we discovered ways to improve on the state of the art. We developed these ideas into a new framework called Boxcar. In this talk, we will discuss the scaling problems we solved, the innovative ideas behind boxcar, and how we built the scalable architecture that we now use throughout our systems.

R.B. Boyer is a Software Engineer who has been with Indeed since late 2007. Over the years he has worked on a variety of projects, including distributed storage, authentication, and service architectures.

Statistics

Views

Total Views
4,126
Views on SlideShare
471
Embed Views
3,655

Actions

Likes
0
Downloads
6
Comments
0

18 Embeds 3,655

http://engineering.indeed.com 3412
http://indeedengblog.wpengine.com 109
https://engineering.indeed.com 65
http://indeedengblog.staging.wpengine.com 15
http://cloud.feedly.com 12
http://feedly.com 10
http://ch.tbe.taleo.net 8
https://inoreader.com 4
http://www.inoreader.com 3
https://www.google.com 3
http://reader.aol.com 3
https://cloud.feedly.com 3
https://feedly.com 2
https://www.newsblur.com 2
https://www.google.de 1
https://newsblur.com 1
http://translate.googleusercontent.com 1
https://digg.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

[@IndeedEng] Boxcar: A self-balancing distributed services protocol [@IndeedEng] Boxcar: A self-balancing distributed services protocol Presentation Transcript

  • Boxcar A self-balancing distributed services protocol
  • R.B. Boyer Software Engineer Resume
  • I help people get jobs.
  • I solve interesting problems.
  • Boxcar was the solution to a problem:
  • Building
  • How we build products Simple Fast Comprehensive Relevant
  • How we build products Simple Fast Comprehensive Relevant
  • How we build systems Simple Fast Resilient Scalable
  • Simple “I want my application to be more complicated” - No one ever
  • Complexity creates confusion
  • Complexity creates confusion Confusion breeds bugs
  • Fast “I want my application to be slower” - No one ever
  • conducted a speed test
  • +500 milliseconds of latency per search
  • 20% fewer searches
  • Speed is a feature
  • Resilient “I want my users to experience outages” - No one ever
  • Programs crash
  • Programs crash Machines die
  • Minimize vulnerability to any failure
  • Scalable “My system will only need to support 10 users” - No one ever
  • Scale with MORE machines
  • Scale with MORE machines Not BIGGER machines
  • TL;DR:
  • Indeed Jobs Sites Job Seekers
  • Aggregation Jobs Sites Job Search Job Seekers
  • Job Search Aggregation
  • Challenge! Job Search Aggregation
  • Challenge: keep this Simple Fast Resilient Scalable
  • Options:
  • Share data access?
  • Example: Shared Database
  • Shared Database Main Database
  • Shared Database Main Application Main Database
  • Shared Database Main Application Analysis Tool Main Database
  • Shared Database Main Application Analysis Tool Billing Application Main Database
  • Shared Database Main Application Main Database Analysis Tool Billing Application Intern Project
  • Shared Database Main Application Main Database Analysis Tool Billing Application Intern Project Other Intern Project
  • Shared Database Main Application Main Database Analysis Tool Billing Application Intern Project Other Intern Project Email Tool
  • Shared Database Main Application Main Database Analysis Tool This is an anti-pattern Billing Application Intern Project Other Intern Project Email Tool
  • On a long enough timeline...
  • Maintenance Nightmare
  • Share data access
  • Share data access Insulate data from consumers
  • Shared Database Main Application Main Database Analysis Tool Billing Application Intern Project Other Intern Project Email Tool
  • Insulated Database Main Application Main Database Main Service Analysis Tool Billing Application Intern Project Other Intern Project Email Tool
  • Service?
  • Service Client Client Client Client
  • Client Client Client Client NETWORK Service
  • Client Client Client Client NETWORK Service Icky Technical Stuff
  • Service NETWORK Databases Client Client Client Client Logging Caches Business Logic ... Client API Icky Technical Details
  • Client API Service.getJobs([12345, 62])
  • Icky Technical Details SELECT * FROM jobs AS j LEFT JOIN companyinfo AS ci ON j.id=ci.job_id LEFT JOIN locations AS loc ON loc.id=j.location_id WHERE j.id IN (12345, 62)
  • Service Oriented Architecture
  • Service Oriented Architecture
  • Boxcar
  • Boxcar is a... self-balancing distributed services protocol
  • Origin Story
  • There was a life before Boxcar
  • There were services before Boxcar
  • Pick one:
  • Doc Service
  • Document Serving Service aka “Doc Service” http://go.indeed.com/docservice
  • Doc Service controls access to JOBS
  • Building Blocks Webapp Wants jobs Doc Service Controls access to jobs Docstore Stores jobs
  • Build it
  • Webapp
  • Webapp Doc Service Docstore
  • Webapp Doc Service Docstore
  • Mission Accomplished Webapp Doc Service Docstore
  • But is it good?
  • How we build systems Simple Fast Resilient Scalable
  • Goodness Metric Simple deploys Efficient networking (Fast) Resilient Horizontally scalable
  • Webapp Doc Service Docstore
  • Webapp Doc Service Docstore
  • ✘ Resilient Webapp Doc Service Docstore
  • Add Resilience
  • Webapp Doc Service Docstore
  • Webapp Webapp Doc Service Doc Service Docstore Docstore
  • Front-end Load Balancer Webapp Webapp Doc Service Doc Service Docstore Docstore
  • Siloed Stacks Front-end Load Balancer Webapp Webapp Doc Service Doc Service Docstore Docstore
  • Siloed Stacks ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ? Horizontally scalable
  • Scaling Silos Front-end Load Balancer Webapp Webapp Doc Service Doc Service Docstore Docstore
  • Scaling Silos Front-end Load Balancer Webapp Webapp Webapp Doc Service Doc Service Docstore Docstore
  • Scaling Silos Front-end Load Balancer Webapp Webapp Webapp Doc Service Doc Service Doc Service Docstore Docstore Docstore
  • Need bigger and bigger machines
  • Vertical Scaling
  • Siloed Stacks ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✘ Horizontally scalable
  • Siloed Stacks ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✘ Horizontally scalable Services Version 1
  • Improve scalability
  • Front-end Load Balancer Webapp Webapp Webapp Doc Service Doc Service Doc Service Docstore Docstore Docstore
  • Front-end Load Balancer Webapp Webapp Webapp Doc Service Load Balancer Doc Service Doc Service Docstore Docstore
  • Per-Service Balancer Front-end Load Balancer Webapp Webapp Webapp Doc Service Load Balancer Doc Service Doc Service Docstore Docstore
  • Per-Service Balancer ~ Simple deploys ? Efficient networking (Fast) ? Resilient ✓ Horizontally scalable
  • Proxying isn’t free ✘2x Bandwidth Webapp Doc Service Load Balancer Doc Service
  • Per-Service Balancer ~ Simple deploys ✘ Efficient networking (Fast) ? Resilient ✓ Horizontally scalable
  • Resilience Front-end Load Balancer Webapp Webapp Webapp SINGLE POINT OF FAILURE Doc Service Doc Service Docstore Docstore
  • Need two balancers
  • Need two balancers ...and a way to balance between them?
  • Load Balancer Balancing Master / Slave Share IP address Heartbeat between nodes Complex
  • Resilience Front-end Load Balancer Webapp Webapp Webapp Doc Service Load Balancer Doc Service Load Balancer Doc Service Doc Service Docstore Docstore
  • Best explained by our Operations folks: “Redundant Array of Inexpensive Datacenters” http://go.indeed.com/raid
  • Per-Service Balancer ~ Simple deploys ✘ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable
  • Per-Service Balancer ~ Simple deploys ✘ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable Services Version 2
  • Reduce network waste
  • Front-end Load Balancer Webapp Webapp Webapp Doc Service Load Balancer Doc Service Doc Service Docstore Docstore
  • Front-end Load Balancer Webapp Webapp Webapp Doc Service Doc Service Docstore Docstore
  • Naive Round Robin Front-end Load Balancer Webapp Webapp Webapp Doc Service Doc Service Docstore Docstore
  • Naive Round Robin ✓ Simple deploys ? Efficient networking (Fast) ? Resilient ✓ Horizontally scalable
  • Direct Connections Webapp ✓1x Bandwidth Doc Service
  • Naive Round Robin ✓ Simple deploys ✓ Efficient networking (Fast) ? Resilient ✓ Horizontally scalable
  • Server A Server B
  • Server A Server B
  • REQUEST Server A Server B
  • ✘ REQUEST Server A Server B
  • ✘ REQUEST Server A Server B
  • REQUEST Server A Server B
  • Server A Server B REQUEST
  • Naive Round Robin ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable
  • Naive Round Robin ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable
  • Naive Round Robin ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable ? Balanced
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Can’t keep up Slow Fast
  • Naive Round Robin ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable ✘ Balanced
  • Naive Round Robin ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable ✘ Balanced NOPE
  • Ensure balance
  • Front-end Load Balancer Webapp Webapp Webapp Doc Service Load Balancer Doc Service Doc Service Docstore Docstore
  • Front-end Load Balancer Webapp Webapp Webapp Distribute! Doc Service Doc Service Docstore Docstore
  • Front-end Load Balancer Webapp Webapp B Webapp B B Doc Service Doc Service Docstore Docstore
  • Front-end Load Balancer Web App Web App B Web App B Doc Service Doc Service Docstore Docstore B
  • Boxcar Front-end Load Balancer Web App Web App B Web App B Doc Service Doc Service Docstore Docstore B
  • Naive Round Robin Per-Service Balancer
  • The Boxcar balancing algorithm is simple
  • Gist Servers assign numeric value to connections Clients use the connection with the lowest numeric value to service each request
  • Server A
  • Server A Server A
  • Server A Slot 0 Slot 1 Slot 2 Slot 3 Slot 4 ...
  • Slot Numbers Just numbers Server A Slot 0 No limit Slot 1 Slot 2 NOT a priority Slot 3 Slot 4 ... ONLY used for balancing
  • LOW slot numbers are the BEST slot numbers
  • Server A Slot 0 USED Slot 1 He llo ! USED Slot 2 Client 2 Slot 3 Slot 4 USED ...
  • Server A Slot 0 USED Slot 1 USED Slot 2 USED Slot 3 Slot 4 USED Slot 2 Client 2 ...
  • Client 2
  • Client 2
  • Client 2
  • Server Client 2 Server Slot 0 Slot 2 A Slot 12 Slot 29 Slot 30 Slot 57 B
  • long-lived connections Server Client 2 Server Slot 0 Slot 2 A Slot 12 Slot 29 Slot 30 Slot 57 B
  • Clients are greedy MINE! 50 2
  • Clients are greedy Want best connections MINE! 50 2 Continually look for better connections Close worst connections
  • Background thread maintains the connection pool
  • Server Client 2 Server Slot 0 Slot 2 A Slot 12 Slot 29 Slot 30 Slot 57 B
  • Slot 17 Server Client 2 Server Slot 0 Slot 2 A Slot 12 Slot 29 Slot 30 Slot 57 B
  • Slot 17 Server Client 2 Server Slot 0 Slot 2 A B Slot 12 Slot 29 Slot 30 ✘ Slot 57 ✘
  • Slot 17 Server Client 2 Server Slot 0 Slot 2 A Slot 12 Slot 29 Slot 30 B
  • Slot 17 Server Client 2 Server Slot 0 Slot 2 A Slot 12 Slot 29 Slot 30 B
  • Server Client 2 Server Slot 0 Slot 2 A Slot 12 Slot 17 Slot 29 Slot 30 B
  • Server Client 2 Server Slot 0 Slot 2 A Slot 12 Slot 17 Slot 29 Slot 30 Continues forever B
  • Incoming Requests Client 2 Slot 0 ACTIVE Slot 2 ACTIVE Slot 12 [idle] Slot 29 ACTIVE Slot 30 [idle] Slot 57 [idle] GetJobs()
  • Incoming Requests Client 2 Slot 0 ACTIVE Slot 2 ACTIVE Slot 12 ACTIVE Slot 29 ACTIVE Slot 30 [idle] Slot 57 [idle] GetJobs()
  • Connections NOT established on-demand
  • Requests to Busy Pool Client 2 Slot 0 ACTIVE Slot 2 ACTIVE Slot 12 ACTIVE Slot 29 ACTIVE Slot 30 ACTIVE Slot 57 ACTIVE GetJobs()
  • Requests to Busy Pool Client 2 Slot 0 ACTIVE Slot 2 ACTIVE Slot 12 ACTIVE Slot 29 ACTIVE Slot 30 ACTIVE Slot 57 ACTIVE ✘ GetJobs() ERROR!
  • Sizing the pool properly is imperative!
  • Gist Redux Servers assign numeric value to connections Clients use the connection with the lowest numeric value to service each request
  • Balanced load is emergent behavior
  • Load Balancing Simulations
  • Server A Server B Client X
  • Server A slot 0 Server B Client X 0
  • Server A slot 0 Server B Client X 0
  • Server A slot 0 Client X 0 0 Server B slot 0
  • Server A slot 0 Server B slot 0 Client X 0 0
  • Server A slot 0 slot 1 1 1 Server B slot 0 slot 1 Client X 0 0
  • Server A slot 0 Client X 0 0 Server B slot 0 Steady-state balance
  • Server A slot 0 Client X 0 0 New Clients Join Server B slot 0
  • Server A slot 0 Client X 0 0 Client Y Server B slot 0
  • Server A slot 0 slot 1 Client X 0 0 1 1 Server B slot 0 slot 1 Client Y
  • Server A slot 0 slot 1 Client X 0 0 Client Y 1 1 Server B slot 0 slot 1
  • Server A slot 0 slot 1 slot 2 Client X 0 0 2 Server B slot 0 slot 1 slot 2 2 Client Y 1 1
  • Server A slot 0 slot 1 Client X 0 0 Client Y 1 1 Server B slot 0 slot 1
  • Server A slot 0 slot 1 Client X 0 0 Client Y 1 1 Server B slot 0 slot 1 Client Z
  • Server A slot 0 slot 1 slot 2 Client X 0 0 Client Y 1 1 Server B slot 0 slot 1 slot 2 2 2 Client Z
  • Server A slot 0 slot 1 slot 2 Client X 0 0 Client Y 1 1 Server B slot 0 slot 1 slot 2 Client Z 2 2
  • Server A slot 0 slot 1 slot 2 slot 3 Client X 0 0 Client Y 1 1 Server B slot 0 slot 1 slot 2 slot 3 3 3 Client Z 2 2
  • Server A slot 0 slot 1 slot 2 Client X 0 0 Client Y 1 1 Server B slot 0 slot 1 slot 2 Client Z 2 2 Steady-state balance
  • Server A slot 0 slot 1 slot 2 Client X 0 0 Server Failure Server B slot 0 slot 1 slot 2 Client Y 1 1 Client Z 2 2
  • Server A slot 0 slot 1 slot 2 Client X 0 Client Y 1 Server B slot 0 Client Z 2
  • Server A slot 0 slot 1 slot 2 Client X 0 Client Y 1 Server B Client Z 2
  • Server A slot 0 slot 1 slot 2 slot 3 Client X 0 Client Y 1 Server B 3 Client Z 2
  • Server A slot 0 slot 1 slot 2 slot 3 Client X 0 Client Y 1 Server B Client Z 2 3
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 Client X 0 4 Client Y 1 Server B Client Z 2 3
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 Client X 0 4 Client Y 1 Server B Client Z 2 3
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Client X 0 4 5 Client Y 1 Server B Client Z 2 3
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Client X 0 4 Client Y 1 5 Server B Client Z 2 3
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Client X 0 4 Client Y 1 5 Server B Client Z 2 3 Steady-state balance
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Client X 0 4 Client Y 1 5 Server B Client Z 2 3 Steady-state balance
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Client X 0 4 Server Restored Client Y 1 5 Server B Client Z 2 3
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Client X 0 4 Client Y 1 5 Server B Client Z 2 3
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Client X 0 4 Client Y 1 5 Server B slot 0 0 Client Z 2 3
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Server B slot 0 Client X 0 4 Client Y 1 5 Client Z 2 0<3
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Server B slot 0 Client X 0 4 Client Y 1 5 Client Z 2 0
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Server B slot 0 Client X 0 4 Client Y 1 5 Client Z 2 0
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Server B slot 0 Client X 0 4 Client Y 1 5 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Client X 0 4 Client Y 1 5 1 Server B slot 0 slot 1 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 slot 5 Server B slot 0 slot 1 Client X 0 4 Client Y 1 1<5 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 Client X 0 4 Client Y 1 1 Server B slot 0 slot 1 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 Client X 0 4 2 Client Y 1 1 Server B slot 0 slot 1 slot 2 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 slot 3 slot 4 Client X 0 2<4 Client Y 1 1 Server B slot 0 slot 1 slot 2 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 Client X 0 2 Client Y 1 1 Server B slot 0 slot 1 slot 2 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 Client X 0 2 Client Y 1 1 Server B slot 0 slot 1 slot 2 Client Z 0 2 Steady-state balance
  • Server A slot 0 slot 1 slot 2 Client X 0 2 Client Shutdown Server B slot 0 slot 1 slot 2 Client Y 1 1 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 Client X 0 2 Client Y 1 1 Server B slot 0 slot 1 slot 2 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 Server B slot 0 slot 1 slot 2 Client X 0 2 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 Server B slot 0 slot 1 slot 2 Client X 0 2 1 Client Z 0 2
  • Server A slot 0 slot 1 slot 2 Server B slot 0 slot 1 slot 2 Client X 0 2 Client Z 0 1<2
  • Server A slot 0 Server B slot 0 slot 1 slot 2 Client X 0 2 Client Z 0 1
  • Server A slot 0 slot 1 Server B slot 0 slot 1 slot 2 1 Client X 0 2 Client Z 0 1
  • Server A slot 0 slot 1 Client X 0 1<2 Server B slot 0 slot 1 slot 2 Client Z 0 1
  • Server A slot 0 slot 1 Server B slot 0 slot 1 Client X 0 1 Client Z 0 1
  • Server A slot 0 slot 1 Client X 0 1 Server B slot 0 slot 1 Client Z 0 1 Steady-state balance
  • Server A slot 0 slot 1 Client X 0 1 Client Rejoins Server B slot 0 slot 1 Client Z 0 1
  • Server A slot 0 slot 1 Client X 0 1 Client Y Server B slot 0 slot 1 Client Z 0 1
  • Server A slot 0 slot 1 slot 2 Client X 0 1 2 Client Y 2 Server B slot 0 slot 1 slot 2 Client Z 0 1
  • Server A slot 0 slot 1 slot 2 Client X 0 1 Client Y 2 2 Server B slot 0 slot 1 slot 2 Client Z 0 1
  • Server A slot 0 slot 1 slot 2 Client X 0 1 Client Y 2 2 Server B slot 0 slot 1 slot 2 Client Z 0 1 Steady-state balance
  • Why does this Balance?
  • Connections are like running water seeking lower ground
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Roughly Equal Distribution Slots Connections Servers
  • Edge cases
  • Server A slot 0 slot 1 Server B slot 0 slot 1 Client X 0 1 Client Z 0 1
  • Server A slot 0 slot 1 Client X 0 1 Balanced but not ideal Server B slot 0 slot 1 Client Z 0 1
  • Server A slot 0 slot 1 Client X 0 1 Server B slot 0 Client Z
  • Server A slot 0 slot 1 Client X 0 1 Server B slot 0 Client Z EMPTY POOL!
  • Server A slot 0 slot 1 Client X 0 1 ✘ Resilient Server B slot 0 EMPTY POOL! Client Z
  • Fix by adding entropy
  • Fix by adding entropy aka “Table Shaking”
  • Table Shaking Servers regularly hang up on connections
  • Table Shaking Servers regularly hang up on connections Clients expect failed connections
  • Table Shaking Servers regularly hang up on connections Clients expect failed connections Failures are retried on new connections
  • Table Shaking Servers regularly hang up on connections Clients expect failed connections Failures are retried on new connections Bad configurations are less likely
  • Server A slot 0 slot 1 Client X 0 1 Table Shaking turns this Server B slot 0 slot 1 Client Z 0 1
  • Server A slot 0 slot 1 Client X 0 1 Into this Server B slot 0 slot 1 Client Z 0 1
  • Server A slot 0 slot 1 Server B slot 0 Client X 0 Client Z 1
  • Server A slot 0 slot 1 Client X 0 YAY! YAY! Server B slot 0 Client Z 1
  • Balancing Tricks: Handicapping
  • Handicapping is Server Self-quarantine
  • Handicapping Exploit slot number assignment
  • Handicapping Exploit slot number assignment Unhealthy servers inflate slot numbers
  • Handicapping Exploit slot number assignment Unhealthy servers inflate slot numbers Clients naturally avoid these servers
  • Slots Connections Servers
  • Unhealthy Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Unhealthy Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers
  • Slots Connections Servers graceful degradation
  • Is Boxcar good?
  • Boxcar ✓ Simple deploys ✓ Efficient networking (Fast) ? Resilient ✓ Horizontally scalable ? Balanced
  • Clients are pessimistic
  • Clients are pessimistic Failure is expected
  • Boxcar ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable ? Balanced
  • Balance Connections Not Requests
  • Balancing Review: Naive Round Robin
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Slow Fast
  • Can’t keep up Slow Fast
  • The problem was that requests (connections) piled up
  • Boxcar has a fixed number of connections
  • Boxcar has a fixed number of connections there’s nothing to pile up
  • Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 0 1 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 Slot 7 Slot 9 Slow Server Fast Server
  • Client 7 9 2 requests 4 requests Slot 7 Slot 9 Slow Server Fast Server
  • Slow servers handle fewer requests
  • No overloaded servers
  • All requests are serviced
  • Load balancing is probabilistic
  • Boxcar ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable ✓ Balanced
  • Boxcar ✓ Simple deploys ✓ Efficient networking (Fast) ✓ Resilient ✓ Horizontally scalable ✓ Balanced
  • Good enough for Indeed
  • Services well over a BILLION requests every day
  • Fundamental technology
  • Powering over 20 different services
  • In production since 2009
  • Service Oriented Architecture
  • Q&A