MongoDB at eBay


Published on

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

MongoDB at eBay

  1. 1. Yuri FinkelsteineBay Platform Architectyfinkelstein@ebay.comMay 2012
  2. 2. DB Scalability @ eBay eBay is one of the first and largest BASE environments based on Oracle DB App1 App2 • Basic Availability • Soft-state Business Business • Eventual consistency Logic Logic Every database we use is shared and partitioned Hint (shard key) Hint (shard key) • N logical hosts names are defined for each use case ahead of time DAL DAL • These logical hosts are mapped to physical based on static mapping tables which are controlled by DBAs Framework Framework • A common ORM framework called DAL provides powerful and consistent patterns for data scalability Applications If the client provides a hint along with every DB F1(Hint) F2(Hint) query: • DAL maps the hint to a logical host using one of N mapping Logical DB schemes (ex: modulus, lookup table, range, etc) … hosts … • Logical host is then mapped to a physical using L-to-Ph map (shards) • The query is sent to just one shard If the client does not have a hint, the query is sent to Config all shards and the results are joined on the client Physical with the help of DAL framework … Master DB Side-effects: hosts • Hint is not part of the query; client has to manage it Physical • Logical to Physical mapping scheme becomes extra piece of … standby client configuration • Shard rebalancing is “DBA magic” DB hosts
  3. 3. Key desired improvements All eBay site-facing applications use the scheme outlined above It’s proven to scale to tens of thousands of developers, petabytes of data, hundreds of millions of SQL queries per day But there is always room for improvements and new ideas • ORM is not the fastest way to develop; how do we achieve faster development cycles and reduce schema mapping frictions? • How do we add new attributes to tables faster and without DBA’s involvement? Schema free approach sounds interesting. • Can we make the hint transparent, ex: auto-extract it from queries? • Can we rebalance the data seamlessly and automatically? • Can we add shards faster in order to scale out on demand and transparently to applications? • How do we deploy new DBs to the cloud on demand? And what about performance? Can we use RAM more aggressively and seamlessly to speed up queries?
  4. 4. Enters MongoDB We are playing with MongoDB since 2010. Why? Business Logic Document Its scalability scheme is very similar to how we shard RDBMS Morphia/Mongo • Single master for writes, eventually consistent slaves for Driver reads Dynamic • Horizontal partitioning of data sets is a norm at eBay Config • MongoS is performing familiar scatter-gather and client- side merge-sorts MongoS F(Shard Key) We don’t use distributed transactions since day 1; transactional updates of multiple tables … that we do use can be simulated by atomic <- Replicas -> updates of a single Mongo document MongoDB offers a number of features that … help address our goals mentioned earlier: • Developers love document model and schema-free persistence … • Hints are embedded into the queries • MongoDB has automatic shard rebalancing • Shards can be added on demand without application restart and data will be auto-rebalanced ---------- Shards ------- • We can easily bring it up in the cloud since cloud machines have storage
  5. 5. Case study #1: eBay Search Suggestions  Search suggestion list is a MongoDB document indexed by word prefix as well as by some metadata: product category, search domain, etc.  Must have < 60-70msec round trip end to end  MongoDB query < 1.4msec  Data set fits in RAM; 100-s M documents  Data is bulk loaded once a day from Hadoop, but can be tweaked on demand during sale promotions, etc  Single replica set, no shards in this case  MongoDB benefits: • Multiple indexes allow flexible lookups • In-memory data placement ensures lookup speed • Large data set is durable and replicated
  6. 6. Case Study #2: Cloud Manager “State Hub” Query  State Hub powers eBay CloudProvision ResourcesResources  Every resource provisioned by the cloud is and Topology represented by a single Mongo document  Documents contain highly structured metadata reflecting roles and grouping of the resources  Lookup by both primary and secondary State Hub indexes Mongo  Several GB data sets, easily fit in RAM Update  Documents are not uniform resource state  All resources have “State” field which is updated periodically to reflect health state of the underlying resource  Mixed workload: lots of in-place writes, but also lots of read queries
  7. 7. Case Study #3: eBay Merchandizing Info Cache  Merchandizing backend powers eBay product/item classification and categorization  Each MongoDB document represents a cluster of similar products  Numerous relationships between clusters are modeled as R1 document attributes Cluster1 Cluster2  Relationship hierarchy traversal is achieved by issuing a R3 number of queries on “edge” attributes R2  Each instance of such a hierarchy is called a model; there Cluster3 are lots of models  Again, data set fits in RAM, single replica set  Replica set members are located in 3 different data centers (3+2+2) with all members in a single data center having higher weight to avoid moving master away  MongoDB benefits: • Schema-free design and declarative indexes are perfect for this use case where new attributes and new queries are constantly being added • Async replication across multiple data centers • MongoDB Java Driver ensures automatic detection of proximity of clients to replica set members; reads with slaveOK=true are served from local data center nodes which insures low response latency
  8. 8. Case Study #4: Zoom – Media Metadata Store This is a new mega project which is a work in progress MongoDB is being evaluated as a storage backend for all media-related metadata on the site (example: picture IDs with lots attributes) Requirements: • Tens of TBs data set, Millions of documents: data set must be partitioned; this is our first use case where MongoDB sharding is used • System of record for picture info; data can not be lost! • Replication/DR across 2 data centers; local DC reads are required • Queries are from site-facing flows; <10msec response time SLA • Mixed workload: both inserts and reads are happening concurrently all the time Can MongoDB do it ??
  9. 9. Zoom: Data Model  2 main collections: Item and Image • Item references multiple Images  Item represents eBay Item: • _id in Item is external ID of the item in eBay site DB • These IDs are already sharded in balanced across N logical DB hosts using ID ranges • We use MongoDB pre-split points for initial mapping our N site DB shards to M MongoDB shards • This ensures good balance between the shards;  Image represents a picture attached to an Item • _id in Image is md5 of the image content • This ensures good distribution across any number of shards • Md5 is also used to find duplicate images  Our choice of document IDs in both collections ensures good balance across Mongo shards  We never query both collections in a single service request to ensure data consistency and to have only one index lookup
  10. 10. Zoom: Service Topology and Configuration  MongoS is deployed on app servers • Ensures network IO on MongoS won’t become a bottleneck • This is a very familiar pattern in eBay as was explained in the>--- DC1(Primary)--- beginning of this presentation  M shards; each replica set has 6 members M M M M • 3 + 3 in 2 data centers • Master can be only in one DC during automatic failover; manual failover may activate another DC --- Replicas ---> • One slave in the secondary DC is invisible for reads and is dedicated to periodic backups/snapshots (more on this later)  For reads, client first sets SlaveOK=true and if required document is not found flips to SlaveOK=false to read from Master -- DC2(Secondary)-->  Home-grown MongoDB configuration and monitoring agent is running on every node • Fetches MongoD configuration from a central configuration store and saves it to local config file • Manages lifecycle of MongoD B B B B • Monitors state and metrics ---- Shards -----
  11. 11. Zoom: Data Backup and Restore strategy  Goals: • Take periodic backups of the entire data set Application • Be able to recover from backup • Do not loose any writes that have happened after last snapshot • Briefly service unavailability during recovery is better than data Dual-write loss … to capped M collection C  Dual writes on the client • Regular write to main cluster … • Second write to another Mongo cluster: single replica set, capped collection, the data written is similar to REDO log record Recovery B Agent  Hidden slave in each shard has volume mounted on a remote storage appliance capable of instant file system snapshot; captures both DB files and journal files  If DB recovery is activated: • All MongoD on primary cluster are shutdown • NFS slave is remounted to snapshot volume Instant • MongoD on this machine is started as a masterShapshot • MongoD on other replica set members are started cold • Full sync-up from masterCapable • Master is switched to a regular member device • Writes that occurred since time when the backup was taken are replayed from the REDO log capped collection in the secondary cluster •
  12. 12. Key Learning MongoDB can be a very powerful tool but use it wisely Deletes can be slow; automatic balancer is dangerous; use it only when you must (example: be careful when adding new shards) Use explain for every query; disable full scans to discover inefficiencies early Query profiler is great Retry every failed query at least once; long tail in response times is possible when data set > RAM size
  13. 13. Questions? Thank you!