SlideShare a Scribd company logo
1 of 24
Living with SQL and
NoSQL at Craigslist
      Jeremy Zawodny
          craigslist
There is no stack
     anymore...
-- Mårten Mickos during Wednesday’s Keynote
Data Storage at craigslist
• MySQL
• Memcached
• Redis
• MongoDB
• Sphinx
• Filesystem
Choosing the Right Tool
• Durability
• Performance
• Query API
• Features
• Complexity
• Support
Request Flow (reads)
Browser                       Load Balancer                       Caching Proxy
         Posting, Search, Browse                                  Perl+epoll      Memcached

                                                                        Proxy Cache


     Web Server                                                   Async Services
Apache      mod_perl     Memcached                                Perl+epoll      Memcached

         Posting Cache


                                                                        haproxy


  MongoDB                                   Sphinx                        MySQL
 Archived Postings                   Live and Archived Postings           Live Postings
Request Flow (reads)
Browser                Load Balancer                   Caching Proxy
      Image Requests                                   Perl+epoll    Memcached

                                                             Proxy Cache




                            Image Storage
                        Apache   mod_perl   xfs+JBOD
Data Repositories
   MongoDB                      MySQL                 Filesystem
OldPostings   Email Meta    Postings      Finance    Images      Logs


                             Users       Misc Meta

                             Abuse      WorkQueue

                             Stats      Monitoring



                                     Redis
 Memcached                  Counters         Lists        Sphinx
 Counters      Postings      Blobs      Monitoring   Postings   Internal

  Blobs        Objects     WorkQueue                 Forums     Archive
MySQL at craigslist
•   Vertical Partitioning: Clusters
    •   auth/users, abuse/spam, postings, finance
•   Sub-partitioning: Roles
    •   master, read, long read, dumper, thrash
•   Lots of SSD storage (mostly fusion-io)
    •   solved most of our performance problems
•   Few manual tasks
    •   re-cloning slaves, master swaps
MySQL at craigslist
• MySQL 5.5.x
 • hoping to move to 5.6.x
    • GTID + crash-safe slaves?!?!
• InnoDB almost everywhere
 • InnoDB compression where it works well
 • Large buffer pool (48GB common)
• haproxy sits between clients and servers
MySQL at Craigslist
      Postings Database Cluster




                                       long read

                                                   long read




                                                                        dumper
                                                               thrash
   write




                                read
           read

                  read

                         read




                           haproxy

                           client(s)
Why MySQL?
•   It’s the devil we know!
    •   Very reliable
    •   Lots of Admin and Dev skills
•   Durability
•   Replication
•   Support
    •   Seriously, look at this ecosystem
•   Data Model
Why memcached?
• Wickedly Fast
• Stable
• Virtually zero administration required
• Easily co-exists with CPU-intensive services
• Muti-core? Run more instances!
Memcached at craigslist
• Primary cache for rendered pages
  (compresed and full), serialized objects, and
  misc. other data
• Used for lots of transient data blobs (and
  occasional counters)
• Custom async client library
 • Some key encoding issues
• Durability via client-side mirroring (think
  RAID-1)
Redis at craigslist
• Primary repository of posting activity
  metadata used in analysis tasks
• Remote replication in 2nd data center
• 80+% of data in sorted sets (ZSETS)
• Sharded multi-node cluster
 • See: http://bit.ly/I4XUCj
Why Redis?
• Features
• Performance
• Flexible Persistence
• Excellent but simple API
• Project Vision
• Muti-core? Run more instances!
MongoDB at craigslist
•   Repository of 2.5+ billion archived postings
    •   growing and growing and growing
•   3 shards across 3 node replica sets
    •   duplicate config in 2nd data center
•   ~6TB of data, sized up to 12TB
•   Biggest challenge was data migration
•   Previous talks:
    •   http://bit.ly/HEYJ57 (before)
    •   http://bit.ly/Hr2qMf (after)
Why MongoDB?
• Schema free
• Active community
• Commercial support
• Perl client!
• Ease of scaling
  • Yay! for built-in sharding support
• Fewer single points of failure
  • Replica sets are awesome
Sphinx at craigslist
• Full-text indexing and search of
 • all live postings
 • all archived postings
 • all forums (in progress)
• 300+ million daily queries
Why Sphinx?
• Performance
• Friendly API
• Flexibility in deployment model
• Commercial support
Filesystem at craigslist
• All uploaded images are stored in XFS
• Multiple image sizes, resized upon upload
Why Filesystem?
• Reliable (and Simple)
 • We use XFS for images and databases
 • Proven technology
• Fast
 • Some other filesystems have had
    performance issues
• Easy to move data around
• No other metadata/indexes to worry about
So Many Data Stores...
• Can be hard for developers if you don’t have
  good APIs or abstractions in place!
  • We built an object layer for our MongoDB
    migration
  • It speaks MySQL, Sphinx, MongoDB,
    Memcached
• Relational vs. Non-Relational?
  • In practice, we often just don’t care
  • NoSQL is a stupid label
Craigslist Tech FAQs
• Self-hosted (no virtualization or “cloud”)
• Mix of hardware (2 main vendors)
 • Blades
 • Larger multi-U multi-disk RAID boxes
• Mostly local storage (SAN for backups)
• Virtually all open source infrastructure
  tools
• Famously small (but growing) tech team
Craigslist is Hiring!
• Developers
 • Back-end
 • Front-end
• Systems Administrators
• Network Engineers
• Email: z@craiglist.org plain text resume!

More Related Content

What's hot

Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 
오픈소스와 거버넌스
오픈소스와 거버넌스오픈소스와 거버넌스
오픈소스와 거버넌스Kevin Kim
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettRetail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettDaniel Zivkovic
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphCambridge Semantics
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12Sergey Petrunya
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
Spring Framework Petclinic sample application
Spring Framework Petclinic sample applicationSpring Framework Petclinic sample application
Spring Framework Petclinic sample applicationAntoine Rey
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsTyler Treat
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowDremio Corporation
 
제 19회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [백발백준] : 백준봇 : 컨테이너 오케스트레이션 기반 백준 문제 추천 봇
제 19회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [백발백준] : 백준봇 : 컨테이너 오케스트레이션 기반 백준 문제 추천 봇제 19회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [백발백준] : 백준봇 : 컨테이너 오케스트레이션 기반 백준 문제 추천 봇
제 19회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [백발백준] : 백준봇 : 컨테이너 오케스트레이션 기반 백준 문제 추천 봇BOAZ Bigdata
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 

What's hot (20)

Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
오픈소스와 거버넌스
오픈소스와 거버넌스오픈소스와 거버넌스
오픈소스와 거버넌스
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettRetail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Spring Framework Petclinic sample application
Spring Framework Petclinic sample applicationSpring Framework Petclinic sample application
Spring Framework Petclinic sample application
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed Systems
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
제 19회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [백발백준] : 백준봇 : 컨테이너 오케스트레이션 기반 백준 문제 추천 봇
제 19회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [백발백준] : 백준봇 : 컨테이너 오케스트레이션 기반 백준 문제 추천 봇제 19회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [백발백준] : 백준봇 : 컨테이너 오케스트레이션 기반 백준 문제 추천 봇
제 19회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [백발백준] : 백준봇 : 컨테이너 오케스트레이션 기반 백준 문제 추천 봇
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 

Viewers also liked

Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Jeremy Zawodny
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistJeremy Zawodny
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs RedisItamar Haber
 
Webinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBWebinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBBoxed Ice
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the NumbersDevin Foley
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesAdrian Nuta
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013Andrew Dunstan
 
Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Dhaval Dalal
 
Shopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web SitesShopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web SitesCharles Wiedenhoft
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistJeremy Zawodny
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinxAdrian Nuta
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQLmwasaha mwagambo
 
Social Media Trends - Content Curation
Social Media Trends - Content CurationSocial Media Trends - Content Curation
Social Media Trends - Content CurationChris Mikulin
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLNguyen Van Vuong
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comknowbigdata
 

Viewers also liked (20)

Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
Webinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBWebinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDB
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the Numbers
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searches
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
 
Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.
 
Red Box Commerce Shopping Cart
Red Box Commerce Shopping CartRed Box Commerce Shopping Cart
Red Box Commerce Shopping Cart
 
Shopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web SitesShopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web Sites
 
Tayra
TayraTayra
Tayra
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at Craigslist
 
SphinxSearch
SphinxSearchSphinxSearch
SphinxSearch
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinx
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQL
 
Social Media Trends - Content Curation
Social Media Trends - Content CurationSocial Media Trends - Content Curation
Social Media Trends - Content Curation
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 

Similar to Living with SQL and NoSQL at craigslist, a Pragmatic Approach

High Performance Drupal Sites
High Performance Drupal SitesHigh Performance Drupal Sites
High Performance Drupal SitesAbayomi Ayoola
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Redis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicRedis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicFelipe Guimarães
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At CraigslistMySQLConference
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStackTesora
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackMatt Lord
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the CloudInes Sombra
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011Tim Y
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresChristian Posta
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityIvan Zoratti
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 

Similar to Living with SQL and NoSQL at craigslist, a Pragmatic Approach (20)

High Performance Drupal Sites
High Performance Drupal SitesHigh Performance Drupal Sites
High Performance Drupal Sites
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Redis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicRedis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - Omnilogic
 
Drop acid
Drop acidDrop acid
Drop acid
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new features
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Living with SQL and NoSQL at craigslist, a Pragmatic Approach

  • 1. Living with SQL and NoSQL at Craigslist Jeremy Zawodny craigslist
  • 2. There is no stack anymore... -- Mårten Mickos during Wednesday’s Keynote
  • 3. Data Storage at craigslist • MySQL • Memcached • Redis • MongoDB • Sphinx • Filesystem
  • 4. Choosing the Right Tool • Durability • Performance • Query API • Features • Complexity • Support
  • 5. Request Flow (reads) Browser Load Balancer Caching Proxy Posting, Search, Browse Perl+epoll Memcached Proxy Cache Web Server Async Services Apache mod_perl Memcached Perl+epoll Memcached Posting Cache haproxy MongoDB Sphinx MySQL Archived Postings Live and Archived Postings Live Postings
  • 6. Request Flow (reads) Browser Load Balancer Caching Proxy Image Requests Perl+epoll Memcached Proxy Cache Image Storage Apache mod_perl xfs+JBOD
  • 7. Data Repositories MongoDB MySQL Filesystem OldPostings Email Meta Postings Finance Images Logs Users Misc Meta Abuse WorkQueue Stats Monitoring Redis Memcached Counters Lists Sphinx Counters Postings Blobs Monitoring Postings Internal Blobs Objects WorkQueue Forums Archive
  • 8. MySQL at craigslist • Vertical Partitioning: Clusters • auth/users, abuse/spam, postings, finance • Sub-partitioning: Roles • master, read, long read, dumper, thrash • Lots of SSD storage (mostly fusion-io) • solved most of our performance problems • Few manual tasks • re-cloning slaves, master swaps
  • 9. MySQL at craigslist • MySQL 5.5.x • hoping to move to 5.6.x • GTID + crash-safe slaves?!?! • InnoDB almost everywhere • InnoDB compression where it works well • Large buffer pool (48GB common) • haproxy sits between clients and servers
  • 10. MySQL at Craigslist Postings Database Cluster long read long read dumper thrash write read read read read haproxy client(s)
  • 11. Why MySQL? • It’s the devil we know! • Very reliable • Lots of Admin and Dev skills • Durability • Replication • Support • Seriously, look at this ecosystem • Data Model
  • 12. Why memcached? • Wickedly Fast • Stable • Virtually zero administration required • Easily co-exists with CPU-intensive services • Muti-core? Run more instances!
  • 13. Memcached at craigslist • Primary cache for rendered pages (compresed and full), serialized objects, and misc. other data • Used for lots of transient data blobs (and occasional counters) • Custom async client library • Some key encoding issues • Durability via client-side mirroring (think RAID-1)
  • 14. Redis at craigslist • Primary repository of posting activity metadata used in analysis tasks • Remote replication in 2nd data center • 80+% of data in sorted sets (ZSETS) • Sharded multi-node cluster • See: http://bit.ly/I4XUCj
  • 15. Why Redis? • Features • Performance • Flexible Persistence • Excellent but simple API • Project Vision • Muti-core? Run more instances!
  • 16. MongoDB at craigslist • Repository of 2.5+ billion archived postings • growing and growing and growing • 3 shards across 3 node replica sets • duplicate config in 2nd data center • ~6TB of data, sized up to 12TB • Biggest challenge was data migration • Previous talks: • http://bit.ly/HEYJ57 (before) • http://bit.ly/Hr2qMf (after)
  • 17. Why MongoDB? • Schema free • Active community • Commercial support • Perl client! • Ease of scaling • Yay! for built-in sharding support • Fewer single points of failure • Replica sets are awesome
  • 18. Sphinx at craigslist • Full-text indexing and search of • all live postings • all archived postings • all forums (in progress) • 300+ million daily queries
  • 19. Why Sphinx? • Performance • Friendly API • Flexibility in deployment model • Commercial support
  • 20. Filesystem at craigslist • All uploaded images are stored in XFS • Multiple image sizes, resized upon upload
  • 21. Why Filesystem? • Reliable (and Simple) • We use XFS for images and databases • Proven technology • Fast • Some other filesystems have had performance issues • Easy to move data around • No other metadata/indexes to worry about
  • 22. So Many Data Stores... • Can be hard for developers if you don’t have good APIs or abstractions in place! • We built an object layer for our MongoDB migration • It speaks MySQL, Sphinx, MongoDB, Memcached • Relational vs. Non-Relational? • In practice, we often just don’t care • NoSQL is a stupid label
  • 23. Craigslist Tech FAQs • Self-hosted (no virtualization or “cloud”) • Mix of hardware (2 main vendors) • Blades • Larger multi-U multi-disk RAID boxes • Mostly local storage (SAN for backups) • Virtually all open source infrastructure tools • Famously small (but growing) tech team
  • 24. Craigslist is Hiring! • Developers • Back-end • Front-end • Systems Administrators • Network Engineers • Email: z@craiglist.org plain text resume!

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n