MongoDB Conference Berlin 2011




  MongoDB as a
 queryable cache
About me


   •      Martin Tepper
   •      Lead Developer at Travel IQ
   •      http://monogreen.de




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Contents


   •      About Travel IQ
   •      The problem
   •      The solution
   •      The headaches




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
About Travel IQ


   •      Meta Search Engine for Flights and Hotels
   •      9 Hotel Providers
   •      21 Flight Providers
   •      ~ 6000 searches per day
   •      ~ 64k provider queries per day




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
About Travel IQ


   •      Real-Time Aggregation
   •      Ruby/Rails based
   •      API-Driven




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Quick aside


   •      Ruby: OO script language
   •      Rails: MVC Web application framework
   •      ActiveRecord: ORM framework




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
The Problem
Basic Architecture




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Basic Architecture




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Strongly Normalized


   •      Very organized
   •      Reuse of models
   •      Saves disk space
   •      But …




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
sql = <<-SQL
SELECT MIN(outerei.id) FROM
(
   SELECT
    OBJ1.starts_at      AS OBJ1_starts_at,
    OBJ1.ends_at        AS OBJ1_ends_at,
    OBJ1.origin_id      AS OBJ1_origin_id,
    OBJ1.destination_id AS OBJ1_destination_id,
    MIN(P1.price)       AS the_price
    FROM packages P1
    LEFT JOIN journeys OBJ1 ON (P1.outbound_journey_id = OBJ1.id)
    LEFT JOIN results R1 ON (R1.package_id = P1.id)
    LEFT JOIN packagings PA1a ON (PA1a.package_id = P1.id AND PA1a.position = 1)
    LEFT JOIN offers O1a ON (PA1a.offer_id = O1a.id)
    WHERE R1.search_id        IN (#{search_id})
    AND R1.search_type        = 'FlightSearch'
    AND O1a.expires_at        > #{expiring_after}
    GROUP BY
    OBJ1.starts_at, OBJ1.ends_at,
    OBJ1.origin_id, OBJ1.destination_id
  ) AS innerei JOIN (
    SELECT P2.id,
    OBJ2.starts_at      AS OBJ2_starts_at,
    OBJ2.ends_at        AS OBJ2_ends_at,
    OBJ2.origin_id      AS OBJ2_origin_id,
    OBJ2.destination_id AS OBJ2_destination_id,
    P2.price
    FROM packages P2
    LEFT JOIN results R2 ON (R2.package_id = P2.id)
    LEFT JOIN journeys OBJ2 ON (P2.outbound_journey_id = OBJ2.id)
    LEFT JOIN packagings PA2a ON (PA2a.package_id = P2.id AND PA2a.position = 1)
    LEFT JOIN offers O2a ON (PA2a.offer_id = O2a.id)
    WHERE R2.search_id        IN (#{search_id})
The problem


   •      Strongly normalized database
   •      Complex query requirements
   •      Lots of joins
   •      ActiveRecord and rendering overhead
   •      Slow API calls




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
The Solution
Solution 1: Schema


   •      Redo the schema
   •      Migration hard
   •      Some relationships hard to denormalize




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Solution 2: Memcached


   •      Memcached
   •      Very fast response times
   •      But no real queries
   →      Horrible abstraction layer




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Memcached response times over time


                                       10,0
response time of api call in seconds




                                        8,0




                                        6,0




                                        4,0




                                        2,0




                                         0
                                              1    2   3   4   5   6   7   8   9   10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45


                                                                                                         seconds after search start



                                                  MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Solution 3: MongoDB


   •      Document-oriented – less render overhead
   •      Grouping of offers
   •      Proper queries and counts
   •      Still quite fast




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
How we use MongoDB




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
How we use MongoDB


   •      Replica set with 2 nodes and 2 arbiters
   •      Two servers with 16 cores / 64GB RAM
        →      run MySQL and MongoDB
   •      ~ 600 writes/s and reads/s normal load
   •      ~ 6000 writes/s doable




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
MongoDB response times over time


                                       10,0
response time of api call in seconds




                                        8,0




                                        6,0




                                        4,0




                                        2,0




                                         0
                                              1    2   3   4   5   6   7   8   9   10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45


                                                                                                         seconds after search start




                                                  MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
The Headaches
Problems with MongoDB


   •      Segmentation Faults
   •      Only in production




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Problems with MongoDB


   •      Segmentation Faults
   •      Only in production
   →      Replica Set helped a lot
   →      Fixed with nightly build




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Problems with MongoDB


   •      Write performance during peak load
   •      Lots of small concurrent writes




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Problems with MongoDB


   •      Write performance during peak load
   •      Lots of small concurrent writes
   →      Solved by bundling writes




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Problems with MongoDB


   •      Hotel data too big to denormalize
   •      In separate collection




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Problems with MongoDB


   •      Hotel data too big to denormalize
   •      In separate collection
   →      Solved with app-level “join“




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Problems with MongoDB


   •      Data consistency
   •      Typical caching problem
   •      Updates to MySQL also in MongoDB




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Problems with MongoDB


   •      Data consistency
   •      Typical caching problem
   •      Updates to MySQL also in MongoDB
   →      Solved with callbacks in ActiveRecord




MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
Thank you

MongoDB as a fast and queryable cache

  • 1.
    MongoDB Conference Berlin2011 MongoDB as a queryable cache
  • 2.
    About me • Martin Tepper • Lead Developer at Travel IQ • http://monogreen.de MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 3.
    Contents • About Travel IQ • The problem • The solution • The headaches MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 4.
    About Travel IQ • Meta Search Engine for Flights and Hotels • 9 Hotel Providers • 21 Flight Providers • ~ 6000 searches per day • ~ 64k provider queries per day MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 6.
    About Travel IQ • Real-Time Aggregation • Ruby/Rails based • API-Driven MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 7.
    Quick aside • Ruby: OO script language • Rails: MVC Web application framework • ActiveRecord: ORM framework MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 8.
  • 9.
    Basic Architecture MongoDB asa queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 10.
    Basic Architecture MongoDB asa queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 12.
    Strongly Normalized • Very organized • Reuse of models • Saves disk space • But … MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 13.
    sql = <<-SQL SELECTMIN(outerei.id) FROM ( SELECT OBJ1.starts_at AS OBJ1_starts_at, OBJ1.ends_at AS OBJ1_ends_at, OBJ1.origin_id AS OBJ1_origin_id, OBJ1.destination_id AS OBJ1_destination_id, MIN(P1.price) AS the_price FROM packages P1 LEFT JOIN journeys OBJ1 ON (P1.outbound_journey_id = OBJ1.id) LEFT JOIN results R1 ON (R1.package_id = P1.id) LEFT JOIN packagings PA1a ON (PA1a.package_id = P1.id AND PA1a.position = 1) LEFT JOIN offers O1a ON (PA1a.offer_id = O1a.id) WHERE R1.search_id IN (#{search_id}) AND R1.search_type = 'FlightSearch' AND O1a.expires_at > #{expiring_after} GROUP BY OBJ1.starts_at, OBJ1.ends_at, OBJ1.origin_id, OBJ1.destination_id ) AS innerei JOIN ( SELECT P2.id, OBJ2.starts_at AS OBJ2_starts_at, OBJ2.ends_at AS OBJ2_ends_at, OBJ2.origin_id AS OBJ2_origin_id, OBJ2.destination_id AS OBJ2_destination_id, P2.price FROM packages P2 LEFT JOIN results R2 ON (R2.package_id = P2.id) LEFT JOIN journeys OBJ2 ON (P2.outbound_journey_id = OBJ2.id) LEFT JOIN packagings PA2a ON (PA2a.package_id = P2.id AND PA2a.position = 1) LEFT JOIN offers O2a ON (PA2a.offer_id = O2a.id) WHERE R2.search_id IN (#{search_id})
  • 14.
    The problem • Strongly normalized database • Complex query requirements • Lots of joins • ActiveRecord and rendering overhead • Slow API calls MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 15.
  • 16.
    Solution 1: Schema • Redo the schema • Migration hard • Some relationships hard to denormalize MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 17.
    Solution 2: Memcached • Memcached • Very fast response times • But no real queries → Horrible abstraction layer MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 18.
    Memcached response timesover time 10,0 response time of api call in seconds 8,0 6,0 4,0 2,0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 seconds after search start MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 19.
    Solution 3: MongoDB • Document-oriented – less render overhead • Grouping of offers • Proper queries and counts • Still quite fast MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 20.
    How we useMongoDB MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 21.
    How we useMongoDB • Replica set with 2 nodes and 2 arbiters • Two servers with 16 cores / 64GB RAM → run MySQL and MongoDB • ~ 600 writes/s and reads/s normal load • ~ 6000 writes/s doable MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 22.
    MongoDB response timesover time 10,0 response time of api call in seconds 8,0 6,0 4,0 2,0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 seconds after search start MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 23.
  • 24.
    Problems with MongoDB • Segmentation Faults • Only in production MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 25.
    Problems with MongoDB • Segmentation Faults • Only in production → Replica Set helped a lot → Fixed with nightly build MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 26.
    Problems with MongoDB • Write performance during peak load • Lots of small concurrent writes MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 27.
    Problems with MongoDB • Write performance during peak load • Lots of small concurrent writes → Solved by bundling writes MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 28.
    Problems with MongoDB • Hotel data too big to denormalize • In separate collection MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 29.
    Problems with MongoDB • Hotel data too big to denormalize • In separate collection → Solved with app-level “join“ MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 30.
    Problems with MongoDB • Data consistency • Typical caching problem • Updates to MySQL also in MongoDB MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 31.
    Problems with MongoDB • Data consistency • Typical caching problem • Updates to MySQL also in MongoDB → Solved with callbacks in ActiveRecord MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
  • 32.