13. .1 March 2013
High Availability was designed…
but not implemented
A single machine in 2 different locations:
Canada/East and Europe/West
Focus on performance, searching over indexing
First customer in prod
RAM: 32GB
Proc: 4 cores, 3.4-3.8 GHz
SSD: 2x 120 GB Raid-0
(Intel 320)
14. .2 June 2013
Implementation of high availability in our architecture
3 machines with a consensus on write… but in the same data center
API clients handled automatic retries in case of error
APPID-1.algolia.io, APPID-2.algolia.io, APPID-3.algolia.io
RAM: 64GB
Proc: 6 cores, 3.2-3.8 GHz
SSD: 2x 300 GB Raid-0
(Intel 320)
15. .3 August 2013
Official launch of the service
Two locations: Europe/West and Canada/East
Same provider but different network
equipment and power units (cheap multi-AZ)
10 API clients, developed manually
(https keep alive, using TLS correctly, retry strategy…)
RAM: 128GB
Proc: 8 cores, 3.1-3.8 GHz
SSD: 2x 300 GB Raid-0
(Intel S3500)
16. .4 January 2014
Deployment is a big risk for high availability
Agile development, 6000+ unit tests, 200+ non regression tests…
But no instant rollback! Result: 8 minutes of indexing downtime ☂
From then on - start with test clusters
- instant rollback
17. .5 July 2014
First deployment on two
data centers
Biggest customer so far !
In Europe, 2 different data centers
at 100Km distance
(already better than AZ of cloud offers)
RAM: 128GB
Proc: 6 cores, 3.5-3.9 GHz
SSD: 2x 400 GB Raid-0
(Intel S3700)
18. .6 October 2014
Automation via Chef
Significant increase in managed machines
Shell Scripts -> Chef
Automation is great but s**t happens…
A typo in a cookbook nearly broke our prod!
From then on: 2 versions of the cookbooks
deployed to different servers of the same cluster
19. .7 February 2015
Launch of our synchronized worldwide infrastructure
8 new regions!
Low latency everywhere with automatic replication
12regions
22. .8 March 2015
Better high availability per region
Spread our US clusters across two
completely different providers
• 2 different data centers in close
locations (24 miles, 1ms latency)
• 3 different machines
• 2 completely different autonomous
systems
23. .9 May 2015
Introducing several DNS
providers
Retry strategy in API clients, again!
1. APPID-dsn.algolia.net
2. Retry randomly,
APPID-1.algolianet.com
APPID-2.algolianet.com
APPID-3.algolianet.com
24. .10 July 2015
Three completely independent
providers per cluster
With 2 providers we could still
loose indexing
Clusters spanning multiple data
centers, autonomous systems and
upstream providers.