14. 9 Data Centers?
Global high availability
• Mapbox is critical infrastructure for our customers
• Mapbox SLA: 99.9%
• Problems for high availability
• AWS problems
• Mapbox software or configuration problems
• Critical deploys
14
24. • Grid over the world
• Every cell of the grid is a tile
• Different zoomlevels
• Zoomlevel 0 is the world
• Zoomlevel 13 is a city
• Every tile is identified by mapid,
coordinates and zoomlevel
24
31. CDN
• When a request comes in:
• Find nearest edge location
• Terminate TLS
• Match request to behaviour
• Look in cache (based on URL & Query String)
• If object is there: return
31
32. CDN
• Your CDN works best if it can serve everything from cache
• How to remove stale data?
• Trade-off: high cache hit rate vs. update delay
• Time-To-Live when a cached object expires
• We use 5 minutes
• 35 % cache hit rate
32
34. DNS
• Originally: Resolve domain names to IP addresses
• Also: Route request to nearest data center
• best region for request based historic on latency
• Amazon: Route53
• Others: Dyn, easyDNS, Akamai
34
37. Load Balancer
• Route requests to application servers
• Entry point to a region
• AWS: Elastic Load Balancer (ELB)
• Others: haproxy, nginx, f5
37
38. Load Balancer
• Terminate TLS
• Determine which application server to route to
• Healthy server
• ELB: Server with least outstanding requests
• Wait for results and return
38
44. DynamoDB
• Primary/Replica
• Reads to replicas, writes only to
primary
• Replicas only in 2 regions
• Reads for non-replica regions need
to go over the Internet
• In-instance caching of
authentication/map information
1
https://www.mapbox.com/blog/scaling-the-mapbox-infrastructure-with-
dynamodb-streams/
44
46. Application Servers
Fetch tiles
• check simultanously in cache (redis) and object store (s3)
• return from where is found first
• if only found in object store, update local cache
46
47. Application Servers
• redis is used as least-recently used cache, thus popular tiles
for a region are usually cached
• s3 is slow, because data is in us-east-1 bucket only
• Stats:
• 80% cache hits
• r3.4xlarge with 122 GB of memory
47
55. Elasticity
• EC2 instances are provisioned via Auto Scaling Group
• Auto Scaling is based on instance CPU load
• Scale up/down if CPU load over/under 55%/20% for 2
minutes
55