SlideShare a Scribd company logo
Instant Search API
Build Unique Search Experiences
Sylvain Utard
VP of Engineering
sylvain@algolia.com
@sylvainutard
Enterprise Search and Analytics
@algolia
Who am I?
5 years @ Exalead, leading the core-engine & NLP teams
• C++
• ExaScript (RIP)
• Java
2 years @ Algolia, VP of Engineering
• C++
• Ruby
• Java
• and 10+ other languages…
@sylvainutard
@algolia
A hosted search API
@algolia
A hosted search API
@algolia
@algolia
A hosted search API
Replies in milliseconds
@algolia
A hosted search API
Replies in milliseconds
From anywhere
@algolia
A hosted search API
Replies in milliseconds
From anywhere With intuitive relevance
@algolia
Algolia Today
@algolia
800+customers in 80+ countries
Algolia Today
@algolia
800+customers in 80+ countries
40B+ Write operationsper month
4B+ User-generated queriesper month
Algolia Today
@algolia
Algolia Today
13locations
800+customers in 80+ countries
40B+ Write operationsper month
4B+ User-generated queriesper month
@algolia
Performance
is our DNA
@algolia
Speed matters
Half a second delay

caused 20% drop in traffic
Every 100ms of latency

costs them 1% in sales
@algolia
Behind the scene
@algolia
Unique set of constraints
High volume of Read & Write operations
@algolia
Unique set of constraints
High volume of Read & Write operations
High-availability
@algolia
Unique set of constraints
High volume of Read & Write operations
High-availability
Worldwide data distribution
@algolia
API Software Stack
Started as a mobile offline SDK
Written in C++
Search code embedded in Nginx as a module
Indexing is done in a separate process
Two redis instances
@algolia
API Hardware
Fast CPU (Xeon E5 >3.5GHz)
In Memory (128GB)
Backed by High-end SSD in Raid-0 (800GB)
Specific kernel settings
@algolia
Scaling horizontally
Several clusters per location
A user is assigned to one master cluster
A user can be replicated to N replicate clusters
@algolia
What is a cluster
Master-Master
Stream of writes via Consensus
At least 3 machines
@algolia
A write in practice
One of the machines accept
the write operation via the API (https)
/1/indexes/MyFirstIndex/batch
@algolia
A write in practice
The file is saved on the three machines
as a temporary file
tmp1265
tmp7864
tmp2357
@algolia
A write in practice
Launch the consensus by contacting
the RAFT master
startConsensus(tmp2357, tmp7864, tmp1265)
@algolia
A write in practice
1 -Master send the commit order to all nodes
2- Each node returns the next job ID to master
3- If there is a majority the file is committed
@algolia
A write in practice
Same job ID on all hosts
Send to slave replicate in parallel
Processed in parallel on all hosts
job42
job42
job42
@algolia
In case one host is down
Continue to accept writes
The two other hosts keep jobs
Jobs are sequential, will catch up at restart
job42job42
@algolia
Distribution
Replicate jobs, not the result
Send to all machines in parallel
Consistent with few seconds delay
@algolia
High availability
Multi-regions in one location
@algolia
High availability
13 fully independent locations
@algolia
Network Optimisations
API usage moving from servers to
browser and mobile apps
Get close to end users
@algolia
Distributed Search Network - Worldwide Synchronization
@algolia
Distributed Search Network - Worldwide Synchronization
@algolia
• 13 locations = 25 datacenters
• No ideal worldwide provider
• AWS is not in India, Eastern EU, Africa…
• Need to handle several providers
• Anticipate long deliveries / customs
• Keep as few providers as possible
Distributed Search Network - Worldwide Synchronization
@algolia
DNS is key
Used to find the closest location
Several DNS providers
Good anycast network
@algolia
API Clients
DNS health checks are not enough
Smart retry logic in all our API Clients
@algolia
Analytics
• What are my users searching for?
• Top search
• Top search without hits
• Top refinements
• From where do they search for?
@algolia
@algolia
@algolia
Analytics
• Billions of user-generated queries per month
• As-you-type aggregation
• ~3 months retentions
• Storing all of them in…
@algolia
Analytics
• Elasticsearch o/
• … without FTS :)
• but with aggregations
@algolia
Analytics• No FTS
• No source
• Doc values everywhere
• SSD only
• Custom aggregations
(deprecated since ES 1.1.0)
@algolia
Top-k Aggregation
• Before
• Linear memory consumption
• Exhaustivity
• After
• Constant memory consumption
• Approximative but enough
@algolia
Building your worldwide infra
- Is long and difficult quest
- Is a real asset & differentiator
The Future of APIs
is Distributed
@algolia
All the details of our architecture
are on HighScalability.com
Want to know more?
THANK YOU!
sylvain@algolia.com
@algolia
Build Unique Search Experiences
W
e are hiring in SF, NYC and Paris 😊

More Related Content

What's hot

What's hot (20)

Hasura 2.0 Webinar
Hasura 2.0   WebinarHasura 2.0   Webinar
Hasura 2.0 Webinar
 
Rapid JCR applications development with Sling
Rapid JCR applications development with SlingRapid JCR applications development with Sling
Rapid JCR applications development with Sling
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic Architecture
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Hasura
HasuraHasura
Hasura
 
Gradle Introduction
Gradle IntroductionGradle Introduction
Gradle Introduction
 
The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Redis
RedisRedis
Redis
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Quarkus k8s
Quarkus   k8sQuarkus   k8s
Quarkus k8s
 

Similar to Algolia - Hosted Search API

In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
Chinmay Kulkarni
 
AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
AWS Summit Tel Aviv - Enterprise Track - Data WarehouseAWS Summit Tel Aviv - Enterprise Track - Data Warehouse
AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
Amazon Web Services
 

Similar to Algolia - Hosted Search API (20)

Algolia's Fury Road to a Worldwide API - Take Off Conference 2016
Algolia's Fury Road to a Worldwide API - Take Off Conference 2016Algolia's Fury Road to a Worldwide API - Take Off Conference 2016
Algolia's Fury Road to a Worldwide API - Take Off Conference 2016
 
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
 
Fury road to a worldwide API - API Days - December 2015
Fury road to a worldwide API - API Days - December 2015Fury road to a worldwide API - API Days - December 2015
Fury road to a worldwide API - API Days - December 2015
 
Algolia's Fury Road to a Worldwide API
Algolia's Fury Road to a Worldwide APIAlgolia's Fury Road to a Worldwide API
Algolia's Fury Road to a Worldwide API
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics Service
 
ACDKOCHI19 - Technical Presentation - Connecting 10000 cars to the AWS Cloud
ACDKOCHI19 - Technical Presentation - Connecting 10000 cars to the AWS CloudACDKOCHI19 - Technical Presentation - Connecting 10000 cars to the AWS Cloud
ACDKOCHI19 - Technical Presentation - Connecting 10000 cars to the AWS Cloud
 
Log Analysis At Scale
Log Analysis At ScaleLog Analysis At Scale
Log Analysis At Scale
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari
 
GraphQL API on a Serverless Environment
GraphQL API on a Serverless EnvironmentGraphQL API on a Serverless Environment
GraphQL API on a Serverless Environment
 
MassTLC Cloud Summit Keynote
MassTLC Cloud Summit KeynoteMassTLC Cloud Summit Keynote
MassTLC Cloud Summit Keynote
 
AWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runners
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
AWS Summit Tel Aviv - Enterprise Track - Data WarehouseAWS Summit Tel Aviv - Enterprise Track - Data Warehouse
AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM Session
 

More from enterprisesearchmeetup

Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
enterprisesearchmeetup
 

More from enterprisesearchmeetup (6)

Cisco meetup-25 april2017
Cisco meetup-25 april2017Cisco meetup-25 april2017
Cisco meetup-25 april2017
 
ElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to AggregationsElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to Aggregations
 
The Elastic ELK Stack
The Elastic ELK StackThe Elastic ELK Stack
The Elastic ELK Stack
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Scalable Search Analytics
Scalable Search AnalyticsScalable Search Analytics
Scalable Search Analytics
 
Practical Relevance Measurement
Practical Relevance MeasurementPractical Relevance Measurement
Practical Relevance Measurement
 

Recently uploaded

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 

Algolia - Hosted Search API