SlideShare a Scribd company logo
Scaling when you are not Google
Abel Muiño
Abel Muino
‣ Lead Software Engineer
‣ Tweets as @amuino
‣ In another life, co-owned
1uptalent.com, played with
Docker and used AWS for
everything.
Disclaimer
‣ Cabify is 5 years old
‣ I joined Cabify about 1.5 years ago to work on product
‣ What you will hear today might be
‣ 70% folklore / 30% experience
‣ Only about production
‣ Not applicable to other areas (data analytics)
Cabify
2011 2012 2013 2014 2015 2016
Completed Journeys
(Axis has no legend because NDA and stuff)
Backend committers
0
5
9
14
18
2011 2012 2013 2014 2015 2016
We are hiring!
(As if it wasn’t obvious from the charts)
Circadian rhythm
Prelude
???? - 2014
Cabify foundations
‣ Mostly Ruby, some Go
‣ Running on VPS
‣ No sysadmins (devops?)
‣ CouchDB
‣ Redis
‣ Home-grown metrics &
monitoring (limited)
Servers
‣ 3 ⨉ Host servers
‣ Horizontally scalable
‣ Most services included (sidecars)
‣ Front + Back + Queue workers
‣ 1 ⨉ Realtime server
‣ Single Point of Failure
‣ Ansible for setting them up
VPS
Provider
LB
web1 web2 web3
worker1
LBLB
redis1 redis2elastic
realtime osrm
websock
CouchDB
‣ Used to be run in-house → Unreliable
‣ Moved to Cloudant
‣ Managed
‣ Bare metal servers
‣ Requisite for everything else: to run on the same datacenter
‣ …because the network matters
Database of choice for Cabify
Pros
‣ Cheap servers
‣ Profesional DB management
‣ Still cheaper than in-house staff
‣ Scales up by either
‣ Emailing Cloudant
‣ Deploying new VPSs
‣ Datacenter lock-in
‣ Scarce visibility on load
‣ Low VPS utilization (for some
services)
Cons
Tl;dr: everything was fine
Until it wasn’t
2015 Road to bare metal
In 2014 we handled
7 times the load of 2013
Installed NewRelic
‣ Monitors our ruby stack
‣ Built custom adapters for API toolkit and CouchDB
‣ Golang not supported 😭
‣ Low hanging fruit for increasing performance
‣ Hint: Always contact a Sales Rep
‣ Bye bye home-grown monitoring! 👋
VPS provider
DDoSed
‣ Several times a week
‣ Cabify was unreachable
‣ VPSs where unreachable 

on the internal network
‣ Slow & bad support
‣ Reputation
‣ Solution: Level up!
Nobody ever got fired for choosing IBM
Moved to Bare Metal @ Softlayer
Same guys hosting our Cloudant cluster 👍
Mindset
Control the core, minimise work for everything else
Everything must go
VPS
Provider
web1 web2 web3
worker1realtime
LBLBLB
redis1 redis2elastic
osrm
subscriber
Load Balancer
‣ Multiple PoP (starting operations in several countries)
‣ CDN
‣ Supporting websockets
‣ … and Load Balancing
‣ Low TCO
‣ https://www.incapsula.com
Redis, ElasticSearch
‣ Same datacenter
‣ Completely managed
‣ Clustered / reliable
‣ RedisLabs
‣ Bonus: Memcached
‣ Qbox
OSRM
‣ Same datacenter
‣ Completely managed
‣ Enhanced dataset
‣ Google Maps & Places (with enterprise license)
‣ 2 / 3, good enough
Can do better?
Can we manage less infra?
Softlayer
web1 web2 web3
worker1realtime
Google
subscriber
Incapsula
RedislabsRedislabsRedislabs
qboxqboxQbox
RedislabsRedislabsCloudant
Subscriber
‣ Felt like reinventing the wheel
‣ Looked for battle-tested bus / queue / broker
‣ In the same datacenter
‣ Had previous experience with RabbitMQ
‣ CloudAMQP
Homebrew message bus / queue
Sidecars
‣ Every server could run Cabify
‣ All services installed
‣ Except Realtime (SPOF)
‣ Horizontal scaling
‣ Good server utilisation (bare metal servers are larger)
Make each host self-sufficient
Cut own servers in 50%
Served 5 times more requests
Softlayer
host01 host02 host03
realtime
Google
Incapsula
RedislabsRedislabsRedislabs
qboxqboxqbox
RedislabsRedislabsCloudant
CloudAMQPCloudAMQP
Pros
‣ Same-datacenter latencies
‣ Only care about our product
‣ Still cheaper than in-house staff
‣ Scales up by either
‣ Emailing a provider
‣ Deploying new Servers
‣ Good visibility on perf
‣ Datacenter lock-in
‣ Still no visibility on Golang perf
‣ Competing services on each
server with different needs
‣ Fast & light http requests
‣ Slow & heavy queue workers
‣ Debugability
Cons
Tl;dr: everything was fine
Until it wasn’t
2016, pushing to
the limit
In 2015 we handled
5 times the load of 2014
In 2016 we would invade LatAm
(new countries, cities, marketing…)
Bumps on the road
‣ Start seeing intermittent latency spikes on Cloudant
‣ Disable some services, get back on track
‣ Tied to peak hours
‣ We lived through these, but was stressful
Be easy on the database
‣ Removed frequent N+1 queries patterns
‣ Moved some queries to ElasticSearch
‣ Started caching more on Memcache
‣ Grew the cluster
‣ From 200ms to 100ms (average) 👏
(trying to sleep better)
RabbitMQ can’t cope
‣ We saturated the cluster CPU with moderate load
‣ Tied to us using tag-based routing
‣ Messages were delivered much later than expected
‣ Made changes to use simpler routing
‣ Is there anything simpler than RabbitMQ for simple routing? 🤔
Interlude
DynDNS goes down, Cloudant uses them
We lose access to our databases cluster load balancer
Patched /etc/hosts with the actual ips in 30 minutes
The right tool for the job
‣ Clouchdb / Cloudant, not the best database for frequent updates
‣ Looking for alternatives to store fast-changing models
‣ RethinkDB
‣ Fast, easy to use, hosted options in same datacenter
‣ Streaming query updates
Expecting growth in line with previous years
Broke RethinkDB load balancer
Database stats were OK, but the LB couldn’t handle our rate
Slow support, no “enterprise” option
Decided to phase out RethinkDB
Wrote our first «database»
Simple in-memory store, backed by Couchdb
Update indexes on writes. All queries are indexed
Implemented in Golang, consumed from Ruby
Replaces RethinkDB, which replaced CouchDB
Cloudant latency spikes fixed!
Grow the cluster for the second time in the year
Load balancers hardware upgraded, problems gone
Also reduced the number of connections from ruby
Relax the Sidecars
‣ Load on background workers interfering with serving http
‣ Split the servers:
‣ Front (ruby/golang http interface)
‣ Workers (ruby job queues, ruby background)
Remove RabbitMQ
Replace with NSQ
Nice mix of sidecar and discovery
Softlayer
Multiplied own servers by 3
Served 4 times more requests
Google
Incapsula
RedislabsRedislabsRedislabs
qboxqboxqbox
RedislabsRedislabsCloudant
CloudAMQPCloudAMQP
host01-09host01-09host01-09host01-09
rt01-02rt01-02
work01-03work01-03work01-03
Pros
‣ Despite the problems, we had
top-notch support from
Cloudant
‣ Easy to scale out
‣ In-process database opened
doors to new features
‣ Datacenter lock-in
‣ Still no visibility on Golang perf
Cons
Cabify @ 2017
In 2016 we handled
4 times the load of 2015
Hired our first
full-time sysadmin!
Taking ownership
Improve our infra
Own load balancers
‣ Still use Incapsula for its PoP
‣ Achieved much better load balancing
‣ 3 new dedicated servers
Better control & traceability
Plans for the future
Own redis cluster
‣ Migrating from Redislabs hosted to Redislabs Enterprise
‣ hosted used virtual servers
‣ we rely heavily on redis (and memcached)
‣ 3 new dedicated servers
‣ WIP
Better control & traceability
Ruby → Elixir
‣ Fun to code with
‣ Higher performance
‣ Less memory
‣ Investment, about to release first service to production
Extract from Product
Dedicated teams and resources for specific components
Make the core of Cabify leaner
Thanks!
And sorry for the 60 slides
Questions?
Abel Muiño
@amuino

More Related Content

What's hot

Developing Java based microservices ready for the world of containers
Developing Java based microservices ready for the world of containersDeveloping Java based microservices ready for the world of containers
Developing Java based microservices ready for the world of containers
Claus Ibsen
 
Altitude SF 2017: Advanced VCL: Shielding and Clustering
Altitude SF 2017: Advanced VCL: Shielding and ClusteringAltitude SF 2017: Advanced VCL: Shielding and Clustering
Altitude SF 2017: Advanced VCL: Shielding and Clustering
Fastly
 
Fluentd - road to v1 -
Fluentd - road to v1 -Fluentd - road to v1 -
Fluentd - road to v1 -
N Masahiro
 
pgWALSync
pgWALSyncpgWALSync
pgWALSync
Rumman Iftekhar
 
Resources, Providers, and Helpers Oh My!
Resources, Providers, and Helpers Oh My!Resources, Providers, and Helpers Oh My!
Resources, Providers, and Helpers Oh My!
Brian Stajkowski
 
Php resque
Php resquePhp resque
Php resque
Chaitanya Kuber
 
ApacheCon EU 2016 - Apache Camel the integration library
ApacheCon EU 2016 - Apache Camel the integration libraryApacheCon EU 2016 - Apache Camel the integration library
ApacheCon EU 2016 - Apache Camel the integration library
Claus Ibsen
 
Integrating systems in the age of Quarkus and Camel
Integrating systems in the age of Quarkus and CamelIntegrating systems in the age of Quarkus and Camel
Integrating systems in the age of Quarkus and Camel
Claus Ibsen
 
Distributed Queue System using Gearman
Distributed Queue System using GearmanDistributed Queue System using Gearman
Distributed Queue System using Gearman
Eric Cho
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 
Handling 10k requests per second with Symfony and Varnish - SymfonyCon Berlin...
Handling 10k requests per second with Symfony and Varnish - SymfonyCon Berlin...Handling 10k requests per second with Symfony and Varnish - SymfonyCon Berlin...
Handling 10k requests per second with Symfony and Varnish - SymfonyCon Berlin...
Alexander Lisachenko
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
Joe Kutner
 
Communication in Python and the C10k problem
Communication in Python and the C10k problemCommunication in Python and the C10k problem
Communication in Python and the C10k problem
Jose Galarza
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
Claus Ibsen
 
Embulk at Treasure Data
Embulk at Treasure DataEmbulk at Treasure Data
Embulk at Treasure Data
Satoshi Akama
 
Distributed Applications with Perl & Gearman
Distributed Applications with Perl & GearmanDistributed Applications with Perl & Gearman
Distributed Applications with Perl & Gearman
Issac Goldstand
 
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Claus Ibsen
 
Using Apache Camel connectors for external connectivity
Using Apache Camel connectors for external connectivityUsing Apache Camel connectors for external connectivity
Using Apache Camel connectors for external connectivityClaus Ibsen
 
Integrating microservices with apache camel on kubernetes
Integrating microservices with apache camel on kubernetesIntegrating microservices with apache camel on kubernetes
Integrating microservices with apache camel on kubernetes
Claus Ibsen
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
m_richardson
 

What's hot (20)

Developing Java based microservices ready for the world of containers
Developing Java based microservices ready for the world of containersDeveloping Java based microservices ready for the world of containers
Developing Java based microservices ready for the world of containers
 
Altitude SF 2017: Advanced VCL: Shielding and Clustering
Altitude SF 2017: Advanced VCL: Shielding and ClusteringAltitude SF 2017: Advanced VCL: Shielding and Clustering
Altitude SF 2017: Advanced VCL: Shielding and Clustering
 
Fluentd - road to v1 -
Fluentd - road to v1 -Fluentd - road to v1 -
Fluentd - road to v1 -
 
pgWALSync
pgWALSyncpgWALSync
pgWALSync
 
Resources, Providers, and Helpers Oh My!
Resources, Providers, and Helpers Oh My!Resources, Providers, and Helpers Oh My!
Resources, Providers, and Helpers Oh My!
 
Php resque
Php resquePhp resque
Php resque
 
ApacheCon EU 2016 - Apache Camel the integration library
ApacheCon EU 2016 - Apache Camel the integration libraryApacheCon EU 2016 - Apache Camel the integration library
ApacheCon EU 2016 - Apache Camel the integration library
 
Integrating systems in the age of Quarkus and Camel
Integrating systems in the age of Quarkus and CamelIntegrating systems in the age of Quarkus and Camel
Integrating systems in the age of Quarkus and Camel
 
Distributed Queue System using Gearman
Distributed Queue System using GearmanDistributed Queue System using Gearman
Distributed Queue System using Gearman
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
Handling 10k requests per second with Symfony and Varnish - SymfonyCon Berlin...
Handling 10k requests per second with Symfony and Varnish - SymfonyCon Berlin...Handling 10k requests per second with Symfony and Varnish - SymfonyCon Berlin...
Handling 10k requests per second with Symfony and Varnish - SymfonyCon Berlin...
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
 
Communication in Python and the C10k problem
Communication in Python and the C10k problemCommunication in Python and the C10k problem
Communication in Python and the C10k problem
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
 
Embulk at Treasure Data
Embulk at Treasure DataEmbulk at Treasure Data
Embulk at Treasure Data
 
Distributed Applications with Perl & Gearman
Distributed Applications with Perl & GearmanDistributed Applications with Perl & Gearman
Distributed Applications with Perl & Gearman
 
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
 
Using Apache Camel connectors for external connectivity
Using Apache Camel connectors for external connectivityUsing Apache Camel connectors for external connectivity
Using Apache Camel connectors for external connectivity
 
Integrating microservices with apache camel on kubernetes
Integrating microservices with apache camel on kubernetesIntegrating microservices with apache camel on kubernetes
Integrating microservices with apache camel on kubernetes
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
 

Viewers also liked

Hello elixir (and otp)
Hello elixir (and otp)Hello elixir (and otp)
Hello elixir (and otp)
Abel Muíño
 
Maven 3… so what?
Maven 3… so what?Maven 3… so what?
Maven 3… so what?
Abel Muíño
 
Chrome Extensions for Web Hackers
Chrome Extensions for Web HackersChrome Extensions for Web Hackers
Chrome Extensions for Web Hackers
Mark Wubben
 
Computer Vision & Communities - State of The Map LatAm 2016 - Sao Paulo
Computer Vision & Communities - State of The Map LatAm 2016 - Sao PauloComputer Vision & Communities - State of The Map LatAm 2016 - Sao Paulo
Computer Vision & Communities - State of The Map LatAm 2016 - Sao Paulo
Claudio Cossio
 
The Dementia Project: Innovation Driven by Social Challenges - As a example ...
The Dementia Project:  Innovation Driven by Social Challenges - As a example ...The Dementia Project:  Innovation Driven by Social Challenges - As a example ...
The Dementia Project: Innovation Driven by Social Challenges - As a example ...
Dementia Friendly Japan Initiative
 
Google AMP 1 an après : quel bilan, quelles perspectives ?
Google AMP 1 an après : quel bilan, quelles perspectives ?Google AMP 1 an après : quel bilan, quelles perspectives ?
Google AMP 1 an après : quel bilan, quelles perspectives ?
Virginie Clève - largow ☕️
 
Bio 3A POSTER- The effect of ethanol on CO₂ production in mice Mus musculus
Bio 3A POSTER- The effect of ethanol on CO₂ production in mice Mus musculusBio 3A POSTER- The effect of ethanol on CO₂ production in mice Mus musculus
Bio 3A POSTER- The effect of ethanol on CO₂ production in mice Mus musculusRichard Niederecker
 
Biodegradable Materials, Biodegradable ink pens IDM10
Biodegradable Materials, Biodegradable ink pens IDM10Biodegradable Materials, Biodegradable ink pens IDM10
Biodegradable Materials, Biodegradable ink pens IDM10
Qatar University- Young Scientists Center (Al-Bairaq)
 
Presentación Frumecar recicladores
Presentación Frumecar recicladoresPresentación Frumecar recicladores
Presentación Frumecar recicladores
Frumecar
 
Tfa_N-methylation
Tfa_N-methylationTfa_N-methylation
Tfa_N-methylationJordan Gipe
 
Informatica y periodismo
Informatica y periodismoInformatica y periodismo
Informatica y periodismo
yarnellis cruz
 
Mary Ann Shaw Center for Public and Community Service Spring 2016 Newsletter
Mary Ann Shaw Center for Public and Community Service Spring 2016 NewsletterMary Ann Shaw Center for Public and Community Service Spring 2016 Newsletter
Mary Ann Shaw Center for Public and Community Service Spring 2016 NewsletterNina Mullin
 
ET2016 Smart Japan Alliance Llilum 161118
ET2016 Smart Japan Alliance Llilum 161118ET2016 Smart Japan Alliance Llilum 161118
ET2016 Smart Japan Alliance Llilum 161118
Atomu Hidaka
 
Los musulmanes
Los musulmanesLos musulmanes
Los musulmanes
Omar Diaz Martinez
 
Fums Profums Salums 2017
Fums Profums Salums 2017Fums Profums Salums 2017
Fums Profums Salums 2017
Albergo Diffuso Borgo Soandri
 
Cloud Technology Ecosystems
Cloud Technology EcosystemsCloud Technology Ecosystems
Cloud Technology EcosystemsJoseph Jacks
 
3. report writing
3. report writing3. report writing
3. report writing
ahmadiqbalw
 
IBM DevOps Workshops at IBM InterConnect 2017
IBM DevOps Workshops at IBM InterConnect 2017IBM DevOps Workshops at IBM InterConnect 2017
IBM DevOps Workshops at IBM InterConnect 2017
IBM DevOps
 
Storia dei rolex dal 1912
Storia dei rolex dal 1912Storia dei rolex dal 1912
Storia dei rolex dal 1912
Franco Danese
 
Tipos de memoria
Tipos de memoriaTipos de memoria
Tipos de memoria
Rayzeraus
 

Viewers also liked (20)

Hello elixir (and otp)
Hello elixir (and otp)Hello elixir (and otp)
Hello elixir (and otp)
 
Maven 3… so what?
Maven 3… so what?Maven 3… so what?
Maven 3… so what?
 
Chrome Extensions for Web Hackers
Chrome Extensions for Web HackersChrome Extensions for Web Hackers
Chrome Extensions for Web Hackers
 
Computer Vision & Communities - State of The Map LatAm 2016 - Sao Paulo
Computer Vision & Communities - State of The Map LatAm 2016 - Sao PauloComputer Vision & Communities - State of The Map LatAm 2016 - Sao Paulo
Computer Vision & Communities - State of The Map LatAm 2016 - Sao Paulo
 
The Dementia Project: Innovation Driven by Social Challenges - As a example ...
The Dementia Project:  Innovation Driven by Social Challenges - As a example ...The Dementia Project:  Innovation Driven by Social Challenges - As a example ...
The Dementia Project: Innovation Driven by Social Challenges - As a example ...
 
Google AMP 1 an après : quel bilan, quelles perspectives ?
Google AMP 1 an après : quel bilan, quelles perspectives ?Google AMP 1 an après : quel bilan, quelles perspectives ?
Google AMP 1 an après : quel bilan, quelles perspectives ?
 
Bio 3A POSTER- The effect of ethanol on CO₂ production in mice Mus musculus
Bio 3A POSTER- The effect of ethanol on CO₂ production in mice Mus musculusBio 3A POSTER- The effect of ethanol on CO₂ production in mice Mus musculus
Bio 3A POSTER- The effect of ethanol on CO₂ production in mice Mus musculus
 
Biodegradable Materials, Biodegradable ink pens IDM10
Biodegradable Materials, Biodegradable ink pens IDM10Biodegradable Materials, Biodegradable ink pens IDM10
Biodegradable Materials, Biodegradable ink pens IDM10
 
Presentación Frumecar recicladores
Presentación Frumecar recicladoresPresentación Frumecar recicladores
Presentación Frumecar recicladores
 
Tfa_N-methylation
Tfa_N-methylationTfa_N-methylation
Tfa_N-methylation
 
Informatica y periodismo
Informatica y periodismoInformatica y periodismo
Informatica y periodismo
 
Mary Ann Shaw Center for Public and Community Service Spring 2016 Newsletter
Mary Ann Shaw Center for Public and Community Service Spring 2016 NewsletterMary Ann Shaw Center for Public and Community Service Spring 2016 Newsletter
Mary Ann Shaw Center for Public and Community Service Spring 2016 Newsletter
 
ET2016 Smart Japan Alliance Llilum 161118
ET2016 Smart Japan Alliance Llilum 161118ET2016 Smart Japan Alliance Llilum 161118
ET2016 Smart Japan Alliance Llilum 161118
 
Los musulmanes
Los musulmanesLos musulmanes
Los musulmanes
 
Fums Profums Salums 2017
Fums Profums Salums 2017Fums Profums Salums 2017
Fums Profums Salums 2017
 
Cloud Technology Ecosystems
Cloud Technology EcosystemsCloud Technology Ecosystems
Cloud Technology Ecosystems
 
3. report writing
3. report writing3. report writing
3. report writing
 
IBM DevOps Workshops at IBM InterConnect 2017
IBM DevOps Workshops at IBM InterConnect 2017IBM DevOps Workshops at IBM InterConnect 2017
IBM DevOps Workshops at IBM InterConnect 2017
 
Storia dei rolex dal 1912
Storia dei rolex dal 1912Storia dei rolex dal 1912
Storia dei rolex dal 1912
 
Tipos de memoria
Tipos de memoriaTipos de memoria
Tipos de memoria
 

Similar to Mad scalability: Scaling when you are not Google

Hive at booking
Hive at bookingHive at booking
Hive at booking
David Morel
 
GraphQL vs. (the) REST
GraphQL vs. (the) RESTGraphQL vs. (the) REST
GraphQL vs. (the) REST
coliquio GmbH
 
Cloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guideCloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guide
sparkfabrik
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
Yan Wang
 
Varnishtest
VarnishtestVarnishtest
Varnishtest
Varnish Software
 
A Gentle Introduction to Functions-as-a-Service
A Gentle Introduction to Functions-as-a-ServiceA Gentle Introduction to Functions-as-a-Service
A Gentle Introduction to Functions-as-a-Service
Valeri Karpov
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
DataWorks Summit
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
DataWorks Summit/Hadoop Summit
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Continuent
 
EDB Postgres with Containers
EDB Postgres with ContainersEDB Postgres with Containers
EDB Postgres with Containers
EDB
 
IPVS for Docker Containers
IPVS for Docker ContainersIPVS for Docker Containers
IPVS for Docker Containers
Bob Sokol
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker Containers
Andrey Sibirev
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
Konstantin Gredeskoul
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
Lynn Langit
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
C4Media
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
More efficient, usable web
More efficient, usable webMore efficient, usable web
More efficient, usable web
Chris Mills
 
Auto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open sourceAuto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open source
MariaDB plc
 
Balkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusionBalkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusion
Balkan Misirli
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
larsgeorge
 

Similar to Mad scalability: Scaling when you are not Google (20)

Hive at booking
Hive at bookingHive at booking
Hive at booking
 
GraphQL vs. (the) REST
GraphQL vs. (the) RESTGraphQL vs. (the) REST
GraphQL vs. (the) REST
 
Cloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guideCloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guide
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
Varnishtest
VarnishtestVarnishtest
Varnishtest
 
A Gentle Introduction to Functions-as-a-Service
A Gentle Introduction to Functions-as-a-ServiceA Gentle Introduction to Functions-as-a-Service
A Gentle Introduction to Functions-as-a-Service
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
 
EDB Postgres with Containers
EDB Postgres with ContainersEDB Postgres with Containers
EDB Postgres with Containers
 
IPVS for Docker Containers
IPVS for Docker ContainersIPVS for Docker Containers
IPVS for Docker Containers
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker Containers
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
More efficient, usable web
More efficient, usable webMore efficient, usable web
More efficient, usable web
 
Auto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open sourceAuto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open source
 
Balkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusionBalkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusion
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 

Mad scalability: Scaling when you are not Google

  • 1. Scaling when you are not Google Abel Muiño
  • 2. Abel Muino ‣ Lead Software Engineer ‣ Tweets as @amuino ‣ In another life, co-owned 1uptalent.com, played with Docker and used AWS for everything.
  • 3. Disclaimer ‣ Cabify is 5 years old ‣ I joined Cabify about 1.5 years ago to work on product ‣ What you will hear today might be ‣ 70% folklore / 30% experience ‣ Only about production ‣ Not applicable to other areas (data analytics)
  • 4.
  • 6. 2011 2012 2013 2014 2015 2016 Completed Journeys (Axis has no legend because NDA and stuff)
  • 8. We are hiring! (As if it wasn’t obvious from the charts)
  • 11. Cabify foundations ‣ Mostly Ruby, some Go ‣ Running on VPS ‣ No sysadmins (devops?) ‣ CouchDB ‣ Redis ‣ Home-grown metrics & monitoring (limited)
  • 12. Servers ‣ 3 ⨉ Host servers ‣ Horizontally scalable ‣ Most services included (sidecars) ‣ Front + Back + Queue workers ‣ 1 ⨉ Realtime server ‣ Single Point of Failure ‣ Ansible for setting them up VPS Provider LB web1 web2 web3 worker1 LBLB redis1 redis2elastic realtime osrm websock
  • 13. CouchDB ‣ Used to be run in-house → Unreliable ‣ Moved to Cloudant ‣ Managed ‣ Bare metal servers ‣ Requisite for everything else: to run on the same datacenter ‣ …because the network matters Database of choice for Cabify
  • 14. Pros ‣ Cheap servers ‣ Profesional DB management ‣ Still cheaper than in-house staff ‣ Scales up by either ‣ Emailing Cloudant ‣ Deploying new VPSs ‣ Datacenter lock-in ‣ Scarce visibility on load ‣ Low VPS utilization (for some services) Cons
  • 15. Tl;dr: everything was fine Until it wasn’t
  • 16. 2015 Road to bare metal
  • 17. In 2014 we handled 7 times the load of 2013
  • 18. Installed NewRelic ‣ Monitors our ruby stack ‣ Built custom adapters for API toolkit and CouchDB ‣ Golang not supported 😭 ‣ Low hanging fruit for increasing performance ‣ Hint: Always contact a Sales Rep ‣ Bye bye home-grown monitoring! 👋
  • 19. VPS provider DDoSed ‣ Several times a week ‣ Cabify was unreachable ‣ VPSs where unreachable 
 on the internal network ‣ Slow & bad support ‣ Reputation ‣ Solution: Level up!
  • 20. Nobody ever got fired for choosing IBM Moved to Bare Metal @ Softlayer Same guys hosting our Cloudant cluster 👍
  • 21. Mindset Control the core, minimise work for everything else
  • 22. Everything must go VPS Provider web1 web2 web3 worker1realtime LBLBLB redis1 redis2elastic osrm subscriber
  • 23. Load Balancer ‣ Multiple PoP (starting operations in several countries) ‣ CDN ‣ Supporting websockets ‣ … and Load Balancing ‣ Low TCO ‣ https://www.incapsula.com
  • 24. Redis, ElasticSearch ‣ Same datacenter ‣ Completely managed ‣ Clustered / reliable ‣ RedisLabs ‣ Bonus: Memcached ‣ Qbox
  • 25. OSRM ‣ Same datacenter ‣ Completely managed ‣ Enhanced dataset ‣ Google Maps & Places (with enterprise license) ‣ 2 / 3, good enough
  • 26. Can do better? Can we manage less infra? Softlayer web1 web2 web3 worker1realtime Google subscriber Incapsula RedislabsRedislabsRedislabs qboxqboxQbox RedislabsRedislabsCloudant
  • 27. Subscriber ‣ Felt like reinventing the wheel ‣ Looked for battle-tested bus / queue / broker ‣ In the same datacenter ‣ Had previous experience with RabbitMQ ‣ CloudAMQP Homebrew message bus / queue
  • 28. Sidecars ‣ Every server could run Cabify ‣ All services installed ‣ Except Realtime (SPOF) ‣ Horizontal scaling ‣ Good server utilisation (bare metal servers are larger) Make each host self-sufficient
  • 29. Cut own servers in 50% Served 5 times more requests Softlayer host01 host02 host03 realtime Google Incapsula RedislabsRedislabsRedislabs qboxqboxqbox RedislabsRedislabsCloudant CloudAMQPCloudAMQP
  • 30. Pros ‣ Same-datacenter latencies ‣ Only care about our product ‣ Still cheaper than in-house staff ‣ Scales up by either ‣ Emailing a provider ‣ Deploying new Servers ‣ Good visibility on perf ‣ Datacenter lock-in ‣ Still no visibility on Golang perf ‣ Competing services on each server with different needs ‣ Fast & light http requests ‣ Slow & heavy queue workers ‣ Debugability Cons
  • 31. Tl;dr: everything was fine Until it wasn’t
  • 33. In 2015 we handled 5 times the load of 2014
  • 34. In 2016 we would invade LatAm (new countries, cities, marketing…)
  • 35. Bumps on the road ‣ Start seeing intermittent latency spikes on Cloudant ‣ Disable some services, get back on track ‣ Tied to peak hours ‣ We lived through these, but was stressful
  • 36. Be easy on the database ‣ Removed frequent N+1 queries patterns ‣ Moved some queries to ElasticSearch ‣ Started caching more on Memcache ‣ Grew the cluster ‣ From 200ms to 100ms (average) 👏 (trying to sleep better)
  • 37. RabbitMQ can’t cope ‣ We saturated the cluster CPU with moderate load ‣ Tied to us using tag-based routing ‣ Messages were delivered much later than expected ‣ Made changes to use simpler routing ‣ Is there anything simpler than RabbitMQ for simple routing? 🤔
  • 38. Interlude DynDNS goes down, Cloudant uses them We lose access to our databases cluster load balancer Patched /etc/hosts with the actual ips in 30 minutes
  • 39. The right tool for the job ‣ Clouchdb / Cloudant, not the best database for frequent updates ‣ Looking for alternatives to store fast-changing models ‣ RethinkDB ‣ Fast, easy to use, hosted options in same datacenter ‣ Streaming query updates Expecting growth in line with previous years
  • 40. Broke RethinkDB load balancer Database stats were OK, but the LB couldn’t handle our rate Slow support, no “enterprise” option Decided to phase out RethinkDB
  • 41. Wrote our first «database» Simple in-memory store, backed by Couchdb Update indexes on writes. All queries are indexed Implemented in Golang, consumed from Ruby Replaces RethinkDB, which replaced CouchDB
  • 42. Cloudant latency spikes fixed! Grow the cluster for the second time in the year Load balancers hardware upgraded, problems gone Also reduced the number of connections from ruby
  • 43. Relax the Sidecars ‣ Load on background workers interfering with serving http ‣ Split the servers: ‣ Front (ruby/golang http interface) ‣ Workers (ruby job queues, ruby background)
  • 44. Remove RabbitMQ Replace with NSQ Nice mix of sidecar and discovery
  • 45. Softlayer Multiplied own servers by 3 Served 4 times more requests Google Incapsula RedislabsRedislabsRedislabs qboxqboxqbox RedislabsRedislabsCloudant CloudAMQPCloudAMQP host01-09host01-09host01-09host01-09 rt01-02rt01-02 work01-03work01-03work01-03
  • 46. Pros ‣ Despite the problems, we had top-notch support from Cloudant ‣ Easy to scale out ‣ In-process database opened doors to new features ‣ Datacenter lock-in ‣ Still no visibility on Golang perf Cons
  • 48. In 2016 we handled 4 times the load of 2015
  • 51. Own load balancers ‣ Still use Incapsula for its PoP ‣ Achieved much better load balancing ‣ 3 new dedicated servers Better control & traceability
  • 52. Plans for the future
  • 53. Own redis cluster ‣ Migrating from Redislabs hosted to Redislabs Enterprise ‣ hosted used virtual servers ‣ we rely heavily on redis (and memcached) ‣ 3 new dedicated servers ‣ WIP Better control & traceability
  • 54. Ruby → Elixir ‣ Fun to code with ‣ Higher performance ‣ Less memory ‣ Investment, about to release first service to production
  • 55. Extract from Product Dedicated teams and resources for specific components Make the core of Cabify leaner
  • 56. Thanks! And sorry for the 60 slides