Instant Search API
Build Unique Search Experiences
Sylvain Utard
VP of Engineering
sylvain@algolia.com
@sylvainutard
Enterprise Search and Analytics
@algolia
Who am I?
5 years @ Exalead, leading the core-engine & NLP teams
• C++
• ExaScript (RIP)
• Java
2 years @ Algolia, VP of Engineering
• C++
• Ruby
• Java
• and 10+ other languages…
@sylvainutard
@algolia
A hosted search API
@algolia
A hosted search API
@algolia
@algolia
A hosted search API
Replies in milliseconds
@algolia
A hosted search API
Replies in milliseconds
From anywhere
@algolia
A hosted search API
Replies in milliseconds
From anywhere With intuitive relevance
@algolia
Algolia Today
@algolia
800+customers in 80+ countries
Algolia Today
@algolia
800+customers in 80+ countries
40B+ Write operationsper month
4B+ User-generated queriesper month
Algolia Today
@algolia
Algolia Today
13locations
800+customers in 80+ countries
40B+ Write operationsper month
4B+ User-generated queriesper month
@algolia
Performance
is our DNA
@algolia
Speed matters
Half a second delay

caused 20% drop in traffic
Every 100ms of latency

costs them 1% in sales
@algolia
Behind the scene
@algolia
Unique set of constraints
High volume of Read & Write operations
@algolia
Unique set of constraints
High volume of Read & Write operations
High-availability
@algolia
Unique set of constraints
High volume of Read & Write operations
High-availability
Worldwide data distribution
@algolia
API Software Stack
Started as a mobile offline SDK
Written in C++
Search code embedded in Nginx as a module
Indexing is done in a separate process
Two redis instances
@algolia
API Hardware
Fast CPU (Xeon E5 >3.5GHz)
In Memory (128GB)
Backed by High-end SSD in Raid-0 (800GB)
Specific kernel settings
@algolia
Scaling horizontally
Several clusters per location
A user is assigned to one master cluster
A user can be replicated to N replicate clusters
@algolia
What is a cluster
Master-Master
Stream of writes via Consensus
At least 3 machines
@algolia
A write in practice
One of the machines accept
the write operation via the API (https)
/1/indexes/MyFirstIndex/batch
@algolia
A write in practice
The file is saved on the three machines
as a temporary file
tmp1265
tmp7864
tmp2357
@algolia
A write in practice
Launch the consensus by contacting
the RAFT master
startConsensus(tmp2357, tmp7864, tmp1265)
@algolia
A write in practice
1 -Master send the commit order to all nodes
2- Each node returns the next job ID to master
3- If there is a majority the file is committed
@algolia
A write in practice
Same job ID on all hosts
Send to slave replicate in parallel
Processed in parallel on all hosts
job42
job42
job42
@algolia
In case one host is down
Continue to accept writes
The two other hosts keep jobs
Jobs are sequential, will catch up at restart
job42job42
@algolia
Distribution
Replicate jobs, not the result
Send to all machines in parallel
Consistent with few seconds delay
@algolia
High availability
Multi-regions in one location
@algolia
High availability
13 fully independent locations
@algolia
Network Optimisations
API usage moving from servers to
browser and mobile apps
Get close to end users
@algolia
Distributed Search Network - Worldwide Synchronization
@algolia
Distributed Search Network - Worldwide Synchronization
@algolia
• 13 locations = 25 datacenters
• No ideal worldwide provider
• AWS is not in India, Eastern EU, Africa…
• Need to handle several providers
• Anticipate long deliveries / customs
• Keep as few providers as possible
Distributed Search Network - Worldwide Synchronization
@algolia
DNS is key
Used to find the closest location
Several DNS providers
Good anycast network
@algolia
API Clients
DNS health checks are not enough
Smart retry logic in all our API Clients
@algolia
Analytics
• What are my users searching for?
• Top search
• Top search without hits
• Top refinements
• From where do they search for?
@algolia
@algolia
@algolia
Analytics
• Billions of user-generated queries per month
• As-you-type aggregation
• ~3 months retentions
• Storing all of them in…
@algolia
Analytics
• Elasticsearch o/
• … without FTS :)
• but with aggregations
@algolia
Analytics• No FTS
• No source
• Doc values everywhere
• SSD only
• Custom aggregations
(deprecated since ES 1.1.0)
@algolia
Top-k Aggregation
• Before
• Linear memory consumption
• Exhaustivity
• After
• Constant memory consumption
• Approximative but enough
@algolia
Building your worldwide infra
- Is long and difficult quest
- Is a real asset & differentiator
The Future of APIs
is Distributed
@algolia
All the details of our architecture
are on HighScalability.com
Want to know more?
THANK YOU!
sylvain@algolia.com
@algolia
Build Unique Search Experiences
W
e are hiring in SF, NYC and Paris 😊

Algolia - Hosted Search API