SlideShare a Scribd company logo
Yuri Finkelstein
Lead Platform Services Architect
yfinkelstein@ebay.com
John Feibusch
Lead DBA Engineer
jfeibusc@ebay.com
May 2013
About eBay Platform Services
 Platform Services is an org within a larger eBay Platform
org which is responsible for developing and operating
common services that are used by Web Application
running on eBay Platform
• Media Storage platform services: image blob and metadata
• Unified Monitoring platform: logs and metrics
• User Behavior Tracking
• Ad Content management and analytics
• Messaging and other middleware services
Platform Services and Media Metadata Service
Requirements
 Platform Services is a DevOps organization
• We develop, we test, we deploy, we operate, we monitor
• Whatever we are responsible for, we own and understand at the depth
of the entire stack
• Therefore, we require transparency of the components we build on
• Transparency at the level of source code visibility is ideal
Key Requirements
 Key requirements of Media Metadata Service
• 99.999% availability
• Strictly defined invocation latency @95 %
• Simultaneous operation in multiple data centers with short replication
latency
• Reliable writes: synchronous writes to at least 2 nodes.
• Read-write workload with reads / write ~= 10/1
• Agility, fluid metadata content; constantly changing business
requirements
• Terabyte scale, billions of small entities to store and query
• Scalability at extreme: number of pictures on eBay is constantly growing
Enters MongoDB
 We have been operating MongoDB in this
project for over a year now
 Sharded cluster in 2 data centers
 Service nodes are built in Java and use
Morphia and Mongo driver
 MongoS runs on the service nodes
 1st year we were maturing the cluster for
writes only; this year we are taking reads
 Reads are from the user-facing web
applications with strong SLA requirements
 For reads, client first sets SlaveOK=true
and if required document is not found flips
to SlaveOK=false to read from Primary
---- Shards -----
---Replicas--->
P P P
H H H
---DC1--->--DC2-->
S
S
S
S
S
S
S
Morphia
Service Layer
Mongo Driver
MongoS
Metadata Service
Node
S – service instance; P – primary
mongod; H – hidden member
Centralized MongoDB configuration store
 Our MongoDB deployment package is based on
custom-build RPM and contains heavy customization
scripts
 One of them is responsible for fetching configuration for
the node it’s running on from a remote configuration
repository at start-up time
 Benefits:
• Can change MongoDB configuration instantly on arbitrary
large number of nodes
• Can change local system settings affecting MongoDB:
read-ahead –settings on block devices and IO scheduler
• Can relocate replica set members across machines (subject
to data migration)
• Consistent inventory tracking, visibility into config settings
on any Mongo machine
Central
MongoDB
Config
Repository
P P P
@ startup time
Upstart
 Upstart is a replacement for init.d; developed for Ubuntu, also used in
RHEL 6
 Can automatically start our monitoring agent whenever mongod starts.
Handles multiple mongod instances well
 Example:
 sudo start mongod interface=0
 Future: Upstart can be controlled by Puppet.
Run multiple MongoD instances on the same machine
 Starting to run multiple mongod processes on one node
 Instead of using multiple ports we create multiple virtual interfaces on a
single host and register them in DNS as if they were real IP addresses
 MongoD supports bind_ip which makes it possible to bind to a specific
virtual interface
 Why virtual interfaces ?
 So that DB hosts can be moved with just a DNS change
 Why do we want to run multiple MongoD on a single host?
 On large machines with lots of disk IO and storage capacity mongod can not
utilize all IO resources
 Running multiple shards on the same machine reduces data granularity and
reduces the scope of each write lock.
 This works well only when multiple MongoD on the same machine have similar
workload
Home grow MongoDB monitoring system
 Home grown agent runs on
each MongoDB host and
collects very specific metrics
that are not available in
MMS:
• Per block-device disk write
latency and disk IOPS
• Details of per-collection
MongoDB metrics
 Can overlay multiple graphs
form RS members on the
same chart
 GLE latency – very important
since we are doing
• getLastError ({w:2})
Media Metadata Service: Data Model
 2 main collections: Item and Image
• Item references multiple Images
 Item represents eBay Item:
• _id in Item is external ID of the item in eBay site DB
• These IDs are already sharded in balanced across N
logical DB hosts using ID ranges
• We use MongoDB pre-split points for initial
mapping our N site DB shards to M MongoDB shards
• This ensures good balance between the shards;
 Image represents a picture attached to an
Item
• _id in Image is based on modified ObjectID of Mongo
• This ensures good distribution across any number of
shards
 Our choice of document IDs in both
collections ensures good balance across
Mongo shards
Problem #1: What should be the ID for the documents?
 ObjectId is not a good shard key for sharded collection as
timestamp occupies the first 4 bytes.
 Problem: how should the app generate the ID when this is
required?
 Requirements:
• Even distribution across shards both long term and short
term
• Localization of the placement of the indexed _id values in the
B-Tree – minimize the chance of page fault on the index page
and increase the chance of collation of the dirty pages in page
cache to reduce the amount of random IO when flushing pages
to diss
• Compactness in size is always good to preserve space
 One possible solution: 6 byte ID in the following order
• 1 byte – rotating sequence ID incremented by each writer on
every document
• 1 byte – writer ID; assuming number of writers < 256
• 4 byte – timestamp in seconds
 Works with limitation that each writer can not insert more
than 256 documents per second
TTimestam
p
MachineID SequenceNo
MongoDB ObjectId():
4 4 4
SequenceNo WriterID TTimestamp
1 1 4
Shard-Friendly ID:
Shard Friednly ID details
Time
Seq=0
Seq=16
6-byte ID value
Seq=255
ff …
0f…
00…
55…
aa…
N-th min N-th+1 min
20 contiguous
ranges for each
sequence
Let’s say we have 20 writers and 3
shards
Number of contiguous intervals in
each shard:
256/3 * 20 = 1100
Worse case scenario: each
contiguous range requires a
separate IO. At 200 IOPS:
~5 sec to flush it
In reality it’s much better because
of 4 k pages
Rate of writes 256 docs/sec
Number of dirty locations over 1
minute: 256 * 60 * 20 = 307,000
So, if _id was md5 or some other
random value generator with
~perfect distribution this would
require 300 times more IOPS
Problem #2: md5 lookup problem
 Md5 is a digest of the image content; used for de-
dupe
 Requirement: find image documents with a given
md5 val
 Option 1: secondary index on the image
documents; does not work because:
• Large DB, random reads cause disk IO
• Image collections is sharded by image ID;
forced to query all shards
 Option 2: Stand-alone replica set (cache)
• Works since data is compact and fits in RAM;
no disk IO
• How do we store md5->image IDs in Mongo?
• Option 2.1: As an array
 Does not work well since when refs are added
documents will grow and relocate.
• Option 2.2: Single Binary Packed into an ID
 Works; lookup is based on prefix search and
covering index
{
_id:Binary(md5),
ref: [ref1, ref2, ref3 …]
}
{
_id:Binary(md5|ref)
}
Query:
Db.coll.find (
{
_id: {$gt : Binary(md5|0x0000)}
},
{ _id : 1}
)
Problem #3: Item’s main picture size lookup
 Image document has image dimensions:
width and height
 Item document references N pictures; one of
them is main
 Problem: lookup image dimensions of the
item’s main picture for 50 item documents at
once with SLA for latency < 20 msec
 It’s a variation of Problem #2 except it’s
worse because ItemID and image
dimensions are in different documents and
50 lookups at once are required
 Again we need a dedicated replica set
 Option 1: prefix search with $or and $and
 Option 2: just query by _id
 Option 3: query by id but on another
compound index: {_id:1, wh:1}
 Winner is option #3! Hint: covering index
{_id:Binary(item|WxH) }
Query:
Db.coll.find ({
$or: [
{_id: {$gt : Binary(id1|0x0000),
{$lt : Binary(id1|0xffff)}
},
{_id: {$gt : Binary(id2|0x0000),
{$lt : Binary(id2|0xffff)}
},
…
]})
{ _id:item, wh:WxH }
Query:
Db.coll.find (
{ _id : {$in : [item1, item2, .]})
{ _id:item, wh:WxH }
Query:
Db.coll.find (
{ _id : {$in : [item1, item2, .]})
.hint({_id:1, wh:1})
Problem #4: Periodic export to Hadoop
 Problem: daily copy of the new or
updated documents to Hadoop
 Option 1: service does 2 writes: to
mongo and to hadoop
• Does not work since Hadoop is not an
online system
 Option 2: secondary index on
lastUpdated (date); then query on
lastUpdated > T
• Does not work well since updating indexed
lastUdated is costly; also consuming a
large number of docs from a live cluster is
disruptive to latency SLAs
 Option 3: OpLog replication
• Winner:
 decouples export from site activity,
 Makes lastUpdated index unccessary
P P P
Problem:
P P P
OpLog
Listener
??
Problem #5: What’s the fastest way to perform
a full scan?
 Problem: you have a huge database/collection,
with terabytes of data and billions of documents
 You need to perform a form of batch processing
on all the documents and you want the fastest
pipe out of mongo
 Option 1: Do it on a live node as it’s serving traffic
• Does not work well when the node is busy
• Also – data consistency may be an issue
 Ok, need to take the node off-line
 Option 2: execute a natural-order scan:
• Natural order cursor
• Works, but slow; lot’s synchronization between two
sides
 Option 3: N cursors using range query on _id or
any other indexed field
• Slow in general case when order of indexed values
on B-Tree and order on disk do not match
 Option 4: N natural-order cursors
One cursor:
db.collection.find
({}, {$natural: 1})
N cursors:
db.collection.find
({}, {$natural: 1})
.skip (i*N)
.limit (N)
Summary
 We are running MongoDB in a demanding environment where it’s
exposed to business sensitive online applications
 It seems to be reliable – this is what matters
 It has lots of features and gives the user lots of option to choose from
 It’s the user’s depth of understanding of the product and desire to
have visibility into every aspect of its performance that will determine
when a particular use case will be a success or not
Questions?
 Thank you!
 Btw, if any of this sounds interesting, we have lots of
similar challenges to work on. So, you know the drill:
yfinkelstein at ebay dot com

More Related Content

What's hot

Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
TrendProgContest13
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
ctrezzo
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
DataWorks Summit
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructure
elliando dias
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
chariorienit
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
Cloudera, Inc.
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Cloudera, Inc.
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
Benoit Perroud
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
Nick Dimiduk
 
Find a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage systemFind a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage system
LIN Yi
 
Hadoop Distributed File System Reliability and Durability at Facebook
Hadoop Distributed File System Reliability and Durability at FacebookHadoop Distributed File System Reliability and Durability at Facebook
Hadoop Distributed File System Reliability and Durability at Facebook
DataWorks Summit
 
HBase internals
HBase internalsHBase internals
HBase internals
Matteo Bertozzi
 
Hadoop DB
Hadoop DBHadoop DB

What's hot (20)

Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructure
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
Find a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage systemFind a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage system
 
Hadoop Distributed File System Reliability and Durability at Facebook
Hadoop Distributed File System Reliability and Durability at FacebookHadoop Distributed File System Reliability and Durability at Facebook
Hadoop Distributed File System Reliability and Durability at Facebook
 
HBase internals
HBase internalsHBase internals
HBase internals
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 

Viewers also liked

NOSQL uma breve introdução
NOSQL uma breve introduçãoNOSQL uma breve introdução
NOSQL uma breve introdução
Wise Systems
 
Artigo Nosql
Artigo NosqlArtigo Nosql
Artigo Nosql
Ademir Tadeu
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
MongoDB
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
Takahiro Inoue
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
Rick Copeland
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
MongoDB
 
No sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbNo sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodb
fabio perrella
 
eBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLeBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQL
Xu Jiang
 
Ebay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayEbay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBay
DataStax Academy
 
Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action:
Jesse Wang
 
ebay
ebayebay
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
Kevin Weil
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
MongoDB
 

Viewers also liked (13)

NOSQL uma breve introdução
NOSQL uma breve introduçãoNOSQL uma breve introdução
NOSQL uma breve introdução
 
Artigo Nosql
Artigo NosqlArtigo Nosql
Artigo Nosql
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
 
No sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbNo sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodb
 
eBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLeBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQL
 
Ebay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayEbay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBay
 
Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action:
 
ebay
ebayebay
ebay
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
 

Similar to MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB presented by Yuri Finkelstein, Architect, eBay

Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
MongoDB
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
musrath mohammad
 
MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt Groupe
MongoDB
 
NoSQLEU: Different NoSQL tools in Production
NoSQLEU: Different NoSQL tools in ProductionNoSQLEU: Different NoSQL tools in Production
NoSQLEU: Different NoSQL tools in Production
Bit Zesty
 
MongoDB
MongoDBMongoDB
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
Kelly Technologies
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
Sandeep Singh
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
sonukumar379092
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
sporst
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-final
MongoDB
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
Norberto Leite
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
Jeremy Zawodny
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 

Similar to MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB presented by Yuri Finkelstein, Architect, eBay (20)

Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt Groupe
 
NoSQLEU: Different NoSQL tools in Production
NoSQLEU: Different NoSQL tools in ProductionNoSQLEU: Different NoSQL tools in Production
NoSQLEU: Different NoSQL tools in Production
 
MongoDB
MongoDBMongoDB
MongoDB
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-final
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 

Recently uploaded (20)

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 

MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB presented by Yuri Finkelstein, Architect, eBay

  • 1. Yuri Finkelstein Lead Platform Services Architect yfinkelstein@ebay.com John Feibusch Lead DBA Engineer jfeibusc@ebay.com May 2013
  • 2. About eBay Platform Services  Platform Services is an org within a larger eBay Platform org which is responsible for developing and operating common services that are used by Web Application running on eBay Platform • Media Storage platform services: image blob and metadata • Unified Monitoring platform: logs and metrics • User Behavior Tracking • Ad Content management and analytics • Messaging and other middleware services
  • 3. Platform Services and Media Metadata Service Requirements  Platform Services is a DevOps organization • We develop, we test, we deploy, we operate, we monitor • Whatever we are responsible for, we own and understand at the depth of the entire stack • Therefore, we require transparency of the components we build on • Transparency at the level of source code visibility is ideal
  • 4. Key Requirements  Key requirements of Media Metadata Service • 99.999% availability • Strictly defined invocation latency @95 % • Simultaneous operation in multiple data centers with short replication latency • Reliable writes: synchronous writes to at least 2 nodes. • Read-write workload with reads / write ~= 10/1 • Agility, fluid metadata content; constantly changing business requirements • Terabyte scale, billions of small entities to store and query • Scalability at extreme: number of pictures on eBay is constantly growing
  • 5. Enters MongoDB  We have been operating MongoDB in this project for over a year now  Sharded cluster in 2 data centers  Service nodes are built in Java and use Morphia and Mongo driver  MongoS runs on the service nodes  1st year we were maturing the cluster for writes only; this year we are taking reads  Reads are from the user-facing web applications with strong SLA requirements  For reads, client first sets SlaveOK=true and if required document is not found flips to SlaveOK=false to read from Primary ---- Shards ----- ---Replicas---> P P P H H H ---DC1--->--DC2--> S S S S S S S Morphia Service Layer Mongo Driver MongoS Metadata Service Node S – service instance; P – primary mongod; H – hidden member
  • 6. Centralized MongoDB configuration store  Our MongoDB deployment package is based on custom-build RPM and contains heavy customization scripts  One of them is responsible for fetching configuration for the node it’s running on from a remote configuration repository at start-up time  Benefits: • Can change MongoDB configuration instantly on arbitrary large number of nodes • Can change local system settings affecting MongoDB: read-ahead –settings on block devices and IO scheduler • Can relocate replica set members across machines (subject to data migration) • Consistent inventory tracking, visibility into config settings on any Mongo machine Central MongoDB Config Repository P P P @ startup time
  • 7. Upstart  Upstart is a replacement for init.d; developed for Ubuntu, also used in RHEL 6  Can automatically start our monitoring agent whenever mongod starts. Handles multiple mongod instances well  Example:  sudo start mongod interface=0  Future: Upstart can be controlled by Puppet.
  • 8. Run multiple MongoD instances on the same machine  Starting to run multiple mongod processes on one node  Instead of using multiple ports we create multiple virtual interfaces on a single host and register them in DNS as if they were real IP addresses  MongoD supports bind_ip which makes it possible to bind to a specific virtual interface  Why virtual interfaces ?  So that DB hosts can be moved with just a DNS change  Why do we want to run multiple MongoD on a single host?  On large machines with lots of disk IO and storage capacity mongod can not utilize all IO resources  Running multiple shards on the same machine reduces data granularity and reduces the scope of each write lock.  This works well only when multiple MongoD on the same machine have similar workload
  • 9. Home grow MongoDB monitoring system  Home grown agent runs on each MongoDB host and collects very specific metrics that are not available in MMS: • Per block-device disk write latency and disk IOPS • Details of per-collection MongoDB metrics  Can overlay multiple graphs form RS members on the same chart  GLE latency – very important since we are doing • getLastError ({w:2})
  • 10. Media Metadata Service: Data Model  2 main collections: Item and Image • Item references multiple Images  Item represents eBay Item: • _id in Item is external ID of the item in eBay site DB • These IDs are already sharded in balanced across N logical DB hosts using ID ranges • We use MongoDB pre-split points for initial mapping our N site DB shards to M MongoDB shards • This ensures good balance between the shards;  Image represents a picture attached to an Item • _id in Image is based on modified ObjectID of Mongo • This ensures good distribution across any number of shards  Our choice of document IDs in both collections ensures good balance across Mongo shards
  • 11. Problem #1: What should be the ID for the documents?  ObjectId is not a good shard key for sharded collection as timestamp occupies the first 4 bytes.  Problem: how should the app generate the ID when this is required?  Requirements: • Even distribution across shards both long term and short term • Localization of the placement of the indexed _id values in the B-Tree – minimize the chance of page fault on the index page and increase the chance of collation of the dirty pages in page cache to reduce the amount of random IO when flushing pages to diss • Compactness in size is always good to preserve space  One possible solution: 6 byte ID in the following order • 1 byte – rotating sequence ID incremented by each writer on every document • 1 byte – writer ID; assuming number of writers < 256 • 4 byte – timestamp in seconds  Works with limitation that each writer can not insert more than 256 documents per second TTimestam p MachineID SequenceNo MongoDB ObjectId(): 4 4 4 SequenceNo WriterID TTimestamp 1 1 4 Shard-Friendly ID:
  • 12. Shard Friednly ID details Time Seq=0 Seq=16 6-byte ID value Seq=255 ff … 0f… 00… 55… aa… N-th min N-th+1 min 20 contiguous ranges for each sequence Let’s say we have 20 writers and 3 shards Number of contiguous intervals in each shard: 256/3 * 20 = 1100 Worse case scenario: each contiguous range requires a separate IO. At 200 IOPS: ~5 sec to flush it In reality it’s much better because of 4 k pages Rate of writes 256 docs/sec Number of dirty locations over 1 minute: 256 * 60 * 20 = 307,000 So, if _id was md5 or some other random value generator with ~perfect distribution this would require 300 times more IOPS
  • 13. Problem #2: md5 lookup problem  Md5 is a digest of the image content; used for de- dupe  Requirement: find image documents with a given md5 val  Option 1: secondary index on the image documents; does not work because: • Large DB, random reads cause disk IO • Image collections is sharded by image ID; forced to query all shards  Option 2: Stand-alone replica set (cache) • Works since data is compact and fits in RAM; no disk IO • How do we store md5->image IDs in Mongo? • Option 2.1: As an array  Does not work well since when refs are added documents will grow and relocate. • Option 2.2: Single Binary Packed into an ID  Works; lookup is based on prefix search and covering index { _id:Binary(md5), ref: [ref1, ref2, ref3 …] } { _id:Binary(md5|ref) } Query: Db.coll.find ( { _id: {$gt : Binary(md5|0x0000)} }, { _id : 1} )
  • 14. Problem #3: Item’s main picture size lookup  Image document has image dimensions: width and height  Item document references N pictures; one of them is main  Problem: lookup image dimensions of the item’s main picture for 50 item documents at once with SLA for latency < 20 msec  It’s a variation of Problem #2 except it’s worse because ItemID and image dimensions are in different documents and 50 lookups at once are required  Again we need a dedicated replica set  Option 1: prefix search with $or and $and  Option 2: just query by _id  Option 3: query by id but on another compound index: {_id:1, wh:1}  Winner is option #3! Hint: covering index {_id:Binary(item|WxH) } Query: Db.coll.find ({ $or: [ {_id: {$gt : Binary(id1|0x0000), {$lt : Binary(id1|0xffff)} }, {_id: {$gt : Binary(id2|0x0000), {$lt : Binary(id2|0xffff)} }, … ]}) { _id:item, wh:WxH } Query: Db.coll.find ( { _id : {$in : [item1, item2, .]}) { _id:item, wh:WxH } Query: Db.coll.find ( { _id : {$in : [item1, item2, .]}) .hint({_id:1, wh:1})
  • 15. Problem #4: Periodic export to Hadoop  Problem: daily copy of the new or updated documents to Hadoop  Option 1: service does 2 writes: to mongo and to hadoop • Does not work since Hadoop is not an online system  Option 2: secondary index on lastUpdated (date); then query on lastUpdated > T • Does not work well since updating indexed lastUdated is costly; also consuming a large number of docs from a live cluster is disruptive to latency SLAs  Option 3: OpLog replication • Winner:  decouples export from site activity,  Makes lastUpdated index unccessary P P P Problem: P P P OpLog Listener ??
  • 16. Problem #5: What’s the fastest way to perform a full scan?  Problem: you have a huge database/collection, with terabytes of data and billions of documents  You need to perform a form of batch processing on all the documents and you want the fastest pipe out of mongo  Option 1: Do it on a live node as it’s serving traffic • Does not work well when the node is busy • Also – data consistency may be an issue  Ok, need to take the node off-line  Option 2: execute a natural-order scan: • Natural order cursor • Works, but slow; lot’s synchronization between two sides  Option 3: N cursors using range query on _id or any other indexed field • Slow in general case when order of indexed values on B-Tree and order on disk do not match  Option 4: N natural-order cursors One cursor: db.collection.find ({}, {$natural: 1}) N cursors: db.collection.find ({}, {$natural: 1}) .skip (i*N) .limit (N)
  • 17. Summary  We are running MongoDB in a demanding environment where it’s exposed to business sensitive online applications  It seems to be reliable – this is what matters  It has lots of features and gives the user lots of option to choose from  It’s the user’s depth of understanding of the product and desire to have visibility into every aspect of its performance that will determine when a particular use case will be a success or not
  • 18. Questions?  Thank you!  Btw, if any of this sounds interesting, we have lots of similar challenges to work on. So, you know the drill: yfinkelstein at ebay dot com

Editor's Notes

  1. Show app servers and mongos on them
  2. Fix md5-&gt;new document ID
  3. 3 shard20 writers