SlideShare a Scribd company logo
1 of 24
Living with SQL and
NoSQL at Craigslist
      Jeremy Zawodny
          craigslist
There is no stack
     anymore...
-- Mårten Mickos during Wednesday’s Keynote
Data Storage at craigslist
• MySQL
• Memcached
• Redis
• MongoDB
• Sphinx
• Filesystem
Choosing the Right Tool
• Durability
• Performance
• Query API
• Features
• Complexity
• Support
Request Flow (reads)
Browser                       Load Balancer                       Caching Proxy
         Posting, Search, Browse                                  Perl+epoll      Memcached

                                                                        Proxy Cache


     Web Server                                                   Async Services
Apache      mod_perl     Memcached                                Perl+epoll      Memcached

         Posting Cache


                                                                        haproxy


  MongoDB                                   Sphinx                        MySQL
 Archived Postings                   Live and Archived Postings           Live Postings
Request Flow (reads)
Browser                Load Balancer                   Caching Proxy
      Image Requests                                   Perl+epoll    Memcached

                                                             Proxy Cache




                            Image Storage
                        Apache   mod_perl   xfs+JBOD
Data Repositories
   MongoDB                      MySQL                 Filesystem
OldPostings   Email Meta    Postings      Finance    Images      Logs


                             Users       Misc Meta

                             Abuse      WorkQueue

                             Stats      Monitoring



                                     Redis
 Memcached                  Counters         Lists        Sphinx
 Counters      Postings      Blobs      Monitoring   Postings   Internal

  Blobs        Objects     WorkQueue                 Forums     Archive
MySQL at craigslist
•   Vertical Partitioning: Clusters
    •   auth/users, abuse/spam, postings, finance
•   Sub-partitioning: Roles
    •   master, read, long read, dumper, thrash
•   Lots of SSD storage (mostly fusion-io)
    •   solved most of our performance problems
•   Few manual tasks
    •   re-cloning slaves, master swaps
MySQL at craigslist
• MySQL 5.5.x
 • hoping to move to 5.6.x
    • GTID + crash-safe slaves?!?!
• InnoDB almost everywhere
 • InnoDB compression where it works well
 • Large buffer pool (48GB common)
• haproxy sits between clients and servers
MySQL at Craigslist
      Postings Database Cluster




                                       long read

                                                   long read




                                                                        dumper
                                                               thrash
   write




                                read
           read

                  read

                         read




                           haproxy

                           client(s)
Why MySQL?
•   It’s the devil we know!
    •   Very reliable
    •   Lots of Admin and Dev skills
•   Durability
•   Replication
•   Support
    •   Seriously, look at this ecosystem
•   Data Model
Why memcached?
• Wickedly Fast
• Stable
• Virtually zero administration required
• Easily co-exists with CPU-intensive services
• Muti-core? Run more instances!
Memcached at craigslist
• Primary cache for rendered pages
  (compresed and full), serialized objects, and
  misc. other data
• Used for lots of transient data blobs (and
  occasional counters)
• Custom async client library
 • Some key encoding issues
• Durability via client-side mirroring (think
  RAID-1)
Redis at craigslist
• Primary repository of posting activity
  metadata used in analysis tasks
• Remote replication in 2nd data center
• 80+% of data in sorted sets (ZSETS)
• Sharded multi-node cluster
 • See: http://bit.ly/I4XUCj
Why Redis?
• Features
• Performance
• Flexible Persistence
• Excellent but simple API
• Project Vision
• Muti-core? Run more instances!
MongoDB at craigslist
•   Repository of 2.5+ billion archived postings
    •   growing and growing and growing
•   3 shards across 3 node replica sets
    •   duplicate config in 2nd data center
•   ~6TB of data, sized up to 12TB
•   Biggest challenge was data migration
•   Previous talks:
    •   http://bit.ly/HEYJ57 (before)
    •   http://bit.ly/Hr2qMf (after)
Why MongoDB?
• Schema free
• Active community
• Commercial support
• Perl client!
• Ease of scaling
  • Yay! for built-in sharding support
• Fewer single points of failure
  • Replica sets are awesome
Sphinx at craigslist
• Full-text indexing and search of
 • all live postings
 • all archived postings
 • all forums (in progress)
• 300+ million daily queries
Why Sphinx?
• Performance
• Friendly API
• Flexibility in deployment model
• Commercial support
Filesystem at craigslist
• All uploaded images are stored in XFS
• Multiple image sizes, resized upon upload
Why Filesystem?
• Reliable (and Simple)
 • We use XFS for images and databases
 • Proven technology
• Fast
 • Some other filesystems have had
    performance issues
• Easy to move data around
• No other metadata/indexes to worry about
So Many Data Stores...
• Can be hard for developers if you don’t have
  good APIs or abstractions in place!
  • We built an object layer for our MongoDB
    migration
  • It speaks MySQL, Sphinx, MongoDB,
    Memcached
• Relational vs. Non-Relational?
  • In practice, we often just don’t care
  • NoSQL is a stupid label
Craigslist Tech FAQs
• Self-hosted (no virtualization or “cloud”)
• Mix of hardware (2 main vendors)
 • Blades
 • Larger multi-U multi-disk RAID boxes
• Mostly local storage (SAN for backups)
• Virtually all open source infrastructure
  tools
• Famously small (but growing) tech team
Craigslist is Hiring!
• Developers
 • Back-end
 • Front-end
• Systems Administrators
• Network Engineers
• Email: z@craiglist.org plain text resume!

More Related Content

What's hot

How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Rails
jduff
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 

What's hot (20)

대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
 
OpenStreetMap 기반의 Mapbox 오픈소스 매핑 서비스
OpenStreetMap 기반의 Mapbox 오픈소스 매핑 서비스OpenStreetMap 기반의 Mapbox 오픈소스 매핑 서비스
OpenStreetMap 기반의 Mapbox 오픈소스 매핑 서비스
 
How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Rails
 
오픈소스GIS를 활용한 서버기반 공간분석과 시각화
오픈소스GIS를 활용한 서버기반 공간분석과 시각화오픈소스GIS를 활용한 서버기반 공간분석과 시각화
오픈소스GIS를 활용한 서버기반 공간분석과 시각화
 
Social network with microservices
Social network with microservicesSocial network with microservices
Social network with microservices
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
 
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
 
20명 규모의 팀에서 Vault 사용하기
20명 규모의 팀에서 Vault 사용하기20명 규모의 팀에서 Vault 사용하기
20명 규모의 팀에서 Vault 사용하기
 
AWS Aurora 운영사례 (by 배은미)
AWS Aurora 운영사례 (by 배은미)AWS Aurora 운영사례 (by 배은미)
AWS Aurora 운영사례 (by 배은미)
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
MSA ( Microservices Architecture ) 발표 자료 다운로드
MSA ( Microservices Architecture ) 발표 자료 다운로드MSA ( Microservices Architecture ) 발표 자료 다운로드
MSA ( Microservices Architecture ) 발표 자료 다운로드
 
AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015
AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015 AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015
AWS로 사용자 천만 명 서비스 만들기 (윤석찬)- 클라우드 태권 2015
 
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
SRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS BatchSRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS Batch
 
Apache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloudApache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloud
 
AWS로 데이터 마이그레이션을 위한 방안과 옵션 - 박성훈 스토리지 스페셜리스트 테크니컬 어카운트 매니저, AWS :: AWS Summit...
AWS로 데이터 마이그레이션을 위한 방안과 옵션 - 박성훈 스토리지 스페셜리스트 테크니컬 어카운트 매니저, AWS :: AWS Summit...AWS로 데이터 마이그레이션을 위한 방안과 옵션 - 박성훈 스토리지 스페셜리스트 테크니컬 어카운트 매니저, AWS :: AWS Summit...
AWS로 데이터 마이그레이션을 위한 방안과 옵션 - 박성훈 스토리지 스페셜리스트 테크니컬 어카운트 매니저, AWS :: AWS Summit...
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
2017 AWS DB Day | Amazon Aurora 자세히 살펴보기
2017 AWS DB Day | Amazon Aurora 자세히 살펴보기2017 AWS DB Day | Amazon Aurora 자세히 살펴보기
2017 AWS DB Day | Amazon Aurora 자세히 살펴보기
 

Viewers also liked

Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the Numbers
Devin Foley
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQL
mwasaha mwagambo
 

Viewers also liked (20)

Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
Webinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBWebinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDB
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the Numbers
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searches
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
 
Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.
 
Red Box Commerce Shopping Cart
Red Box Commerce Shopping CartRed Box Commerce Shopping Cart
Red Box Commerce Shopping Cart
 
Shopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web SitesShopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web Sites
 
Tayra
TayraTayra
Tayra
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at Craigslist
 
SphinxSearch
SphinxSearchSphinxSearch
SphinxSearch
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinx
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQL
 
Social Media Trends - Content Curation
Social Media Trends - Content CurationSocial Media Trends - Content Curation
Social Media Trends - Content Curation
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for Data
 

Similar to Living with SQL and NoSQL at craigslist, a Pragmatic Approach

Redis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicRedis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - Omnilogic
Felipe Guimarães
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
MySQLConference
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
Heriyadi Janwar
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Similar to Living with SQL and NoSQL at craigslist, a Pragmatic Approach (20)

High Performance Drupal Sites
High Performance Drupal SitesHigh Performance Drupal Sites
High Performance Drupal Sites
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Redis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicRedis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - Omnilogic
 
Drop acid
Drop acidDrop acid
Drop acid
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new features
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Living with SQL and NoSQL at craigslist, a Pragmatic Approach

  • 1. Living with SQL and NoSQL at Craigslist Jeremy Zawodny craigslist
  • 2. There is no stack anymore... -- Mårten Mickos during Wednesday’s Keynote
  • 3. Data Storage at craigslist • MySQL • Memcached • Redis • MongoDB • Sphinx • Filesystem
  • 4. Choosing the Right Tool • Durability • Performance • Query API • Features • Complexity • Support
  • 5. Request Flow (reads) Browser Load Balancer Caching Proxy Posting, Search, Browse Perl+epoll Memcached Proxy Cache Web Server Async Services Apache mod_perl Memcached Perl+epoll Memcached Posting Cache haproxy MongoDB Sphinx MySQL Archived Postings Live and Archived Postings Live Postings
  • 6. Request Flow (reads) Browser Load Balancer Caching Proxy Image Requests Perl+epoll Memcached Proxy Cache Image Storage Apache mod_perl xfs+JBOD
  • 7. Data Repositories MongoDB MySQL Filesystem OldPostings Email Meta Postings Finance Images Logs Users Misc Meta Abuse WorkQueue Stats Monitoring Redis Memcached Counters Lists Sphinx Counters Postings Blobs Monitoring Postings Internal Blobs Objects WorkQueue Forums Archive
  • 8. MySQL at craigslist • Vertical Partitioning: Clusters • auth/users, abuse/spam, postings, finance • Sub-partitioning: Roles • master, read, long read, dumper, thrash • Lots of SSD storage (mostly fusion-io) • solved most of our performance problems • Few manual tasks • re-cloning slaves, master swaps
  • 9. MySQL at craigslist • MySQL 5.5.x • hoping to move to 5.6.x • GTID + crash-safe slaves?!?! • InnoDB almost everywhere • InnoDB compression where it works well • Large buffer pool (48GB common) • haproxy sits between clients and servers
  • 10. MySQL at Craigslist Postings Database Cluster long read long read dumper thrash write read read read read haproxy client(s)
  • 11. Why MySQL? • It’s the devil we know! • Very reliable • Lots of Admin and Dev skills • Durability • Replication • Support • Seriously, look at this ecosystem • Data Model
  • 12. Why memcached? • Wickedly Fast • Stable • Virtually zero administration required • Easily co-exists with CPU-intensive services • Muti-core? Run more instances!
  • 13. Memcached at craigslist • Primary cache for rendered pages (compresed and full), serialized objects, and misc. other data • Used for lots of transient data blobs (and occasional counters) • Custom async client library • Some key encoding issues • Durability via client-side mirroring (think RAID-1)
  • 14. Redis at craigslist • Primary repository of posting activity metadata used in analysis tasks • Remote replication in 2nd data center • 80+% of data in sorted sets (ZSETS) • Sharded multi-node cluster • See: http://bit.ly/I4XUCj
  • 15. Why Redis? • Features • Performance • Flexible Persistence • Excellent but simple API • Project Vision • Muti-core? Run more instances!
  • 16. MongoDB at craigslist • Repository of 2.5+ billion archived postings • growing and growing and growing • 3 shards across 3 node replica sets • duplicate config in 2nd data center • ~6TB of data, sized up to 12TB • Biggest challenge was data migration • Previous talks: • http://bit.ly/HEYJ57 (before) • http://bit.ly/Hr2qMf (after)
  • 17. Why MongoDB? • Schema free • Active community • Commercial support • Perl client! • Ease of scaling • Yay! for built-in sharding support • Fewer single points of failure • Replica sets are awesome
  • 18. Sphinx at craigslist • Full-text indexing and search of • all live postings • all archived postings • all forums (in progress) • 300+ million daily queries
  • 19. Why Sphinx? • Performance • Friendly API • Flexibility in deployment model • Commercial support
  • 20. Filesystem at craigslist • All uploaded images are stored in XFS • Multiple image sizes, resized upon upload
  • 21. Why Filesystem? • Reliable (and Simple) • We use XFS for images and databases • Proven technology • Fast • Some other filesystems have had performance issues • Easy to move data around • No other metadata/indexes to worry about
  • 22. So Many Data Stores... • Can be hard for developers if you don’t have good APIs or abstractions in place! • We built an object layer for our MongoDB migration • It speaks MySQL, Sphinx, MongoDB, Memcached • Relational vs. Non-Relational? • In practice, we often just don’t care • NoSQL is a stupid label
  • 23. Craigslist Tech FAQs • Self-hosted (no virtualization or “cloud”) • Mix of hardware (2 main vendors) • Blades • Larger multi-U multi-disk RAID boxes • Mostly local storage (SAN for backups) • Virtually all open source infrastructure tools • Famously small (but growing) tech team
  • 24. Craigslist is Hiring! • Developers • Back-end • Front-end • Systems Administrators • Network Engineers • Email: z@craiglist.org plain text resume!

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n