Nosql Now 2012: MongoDB Use Cases


Published on

Presentation by 10gen CEO and Co-Founder Dwight Merriman

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • In the beginning, there was RDBMS, and if you needed to store data, that was what you used. But RDBMS is performance critical, and BI workloads tended to suck up system resources. So we carved off the data warehouse as a place to store a copy of the operational data for use in analytical queries. This offloaded work from the RDBMS and bought us cycles to scale higher. Today, we’re seeing another split. There’s a new set of workloads that are saturating RDBMS, and these are being carved off into yet another tier of our data architecture: the NoSQL store.
  • Nosql Now 2012: MongoDB Use Cases

    1. 1. MongoDB and NoSQLUse CasesDwight Merriman, 10gen
    2. 2. Trends• More data• Complex data• Cloud computing + Computer architecture trends -> – Many commodity-type servers rather than one large server; commodity-type storage• Fast application start->deploy expectations -> – Agile software development methodologies / iteration – Service oriented architectures
    3. 3. Wants• Horizontal scaling• Ability to store complex data and deal with the malleability of real world schemas without pain• Works with my (object-oriented) programming language without friction• Works with my frequent release cycles (iteration) without friction• High single server performance• Cloud-friendly
    4. 4. Wants• Horizontal scaling• Ability to store complex data and deal with the malleability of real world schemas without pain We Need• Works with my (object-oriented) programming language without friction A way to scale out• Works with my frequent release cycles (iteration) without friction A new data model• High single server performance• Cloud-friendly
    5. 5. Approach• A new data model gives us a way to scale, and a way to solve our development wants• Goals for data model: – Maintain data separation from code – Low friction and mapping cost our programming language – Malleability for adapting to constant changes of the real world – Ability to deal with polymorphic data
    6. 6. Approach• Rich documents + Partitioning – Each document lives on one shard (partitioning)• The catch: – No complex transactions
    7. 7. Approach• Rich documents + Partitioning – Each document lives on one shard (partitioning)• The catch: – No complex transactions
    8. 8. Wants• Horizontal scaling• Ability to store complex data and deal with the malleability of real world schemas without pain• Works with my (object-oriented) programming language without friction• Works with my frequent release cycles Thus implying (iteration) without friction use cases…• High single server performance• Cloud-friendly• Caveats / trade-offs: – No complex transactions
    9. 9. When should you consider using MongoDB?• You find yourself coding around database performance issues – for example adding lots of caching.• You are storing data in flat files.• You are batch processing yet you need real-time.• You are doing agile development, e.g., Scrum.• Your data is complex to model in a relational db. e.g. a complex derivative security; electronic health records, ...• Your project is late :-)• You are forced to use expensive SANs, proprietary servers, or proprietary networks for your existing db.• You are deploying to a public or private cloud.
    10. 10. When should you use something else?• Problems requiring SQL.• Systems with a heavy emphasis on complex transactions such as banking systems and accounting.• Traditional Non-Realtime Data Warehousing (sometimes). Traditional relational data warehouses and variants (columnar relational) are well suited for certain business intelligence problems – especially if you need SQL for your client tool (e.g. MicroStrategy). Exceptions where MongoDB is good are: • cases where the analytics are realtime • cases where the data very complicated to model in relational • when the data volume is huge • when the source data is already in a mongo database
    11. 11. RDBMS NoSQL DB RDBMS Data Data Warehouse Warehouse RDBMSThe beginning Last 10 years Today
    12. 12. Example usersContent Management Operational Intelligence Meta Data Management w User Data Management High Volume Data Feeds
    13. 13. High Volume Data Feeds Machine • More machines, more sensors, more Generated data Data • Variably structuredStock Market • High frequency trading DataSocial Media • Multiple sources of data Firehose • Each changes their format constantly
    14. 14. Operational Intelligence • Large volume of state about usersAd Targeting • Very strict latency requirements • Expose report data to millions of customers Real time • Report on large volumes of datadashboards • Reports that update in real timeSocial Media • What are people talking about? Monitoring
    15. 15. Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to derive interesting and actionable patterns from their customers’ website traffic Problem Why MongoDB Impact Intuit hosts more than 500,000  Intuit hosts more than 500,000  In one week Intuit was able to websites websites become proficient in MongoDB wanted to collect and analyze  wanted to collect and analyze development data to recommend conversion data to recommend conversion  Developed application features and lead generation and lead generation more quickly for MongoDB than improvements to customers. improvements to customers. for relational databases With 10 years worth of user  With 10 years worth of user  MongoDB was 2.5 times faster data, it took several days to data, it took several days to than MySQL process the information using a process the information using a relational database. relational database. We did a prototype for one week, and within one week we had made big progress. Very big progress. It was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit
    16. 16. Marketing Personalization Rich profiles collecting multiple complex actions1 See Ad Scale out to support { cookie_id: ‚1234512413243‛, high throughput of advertiser:{ apple: { activities tracked actions: [2 See Ad { impression: ‘ad1’, time: 123 }, { impression: ‘ad2’, time: 232 }, { click: ‘ad2’, time: 235 }, { add_to_cart: ‘laptop’, sku: ‘asdf23f’, time: 254 }, Click { purchase: ‘laptop’, time: 354 }3 ] } } } Dynamic schemas make it easy to track Indexing and4 Convert vendor specific querying to support attributes matching, frequency capping
    17. 17. Meta Data Management Data • Meta data about artifacts • Content in the libraryArchivingInformation • Have data sources that you don’t have access to • Stores meta-data on those stores and figure out discovery which ones have the content • Retina scansBiometrics • Finger prints
    18. 18. Meta data Indexing and rich query API for easy searching and sorting db.archives. find({ ‚country”: ‚Egypt‛ }); Flexible data model for similar, but different objects{ type: ‚Artefact‛, { ISBN: ‚00e8da9b‛, medium: ‚Ceramic‛, type: ‚Book‛, country: ‚Egypt‛, country: ‚Egypt‛, year: ‚3000 BC‛ title: ‚Ancient Egypt‛} }
    19. 19. Shutterfly uses MongoDB to safeguard more than six billion images for millions of customers in the form of photos and videos, and turn everyday pictures into keepsakes Problem Why MongoDB Impact Managing 20TB of data (six  JSON-based data structure  500% cost reduction and 900% billion images for millions of  Provided Shutterfly with an performance improvement customers) partitioning by agile, high compared to previous Oracle function. performance, scalable solution implementation Home-grown key value store on at a low cost.  Accelerated time-to-market for top of their Oracle database  Works seamlessly with nearly a dozen projects on offered sub-par performance Shutterfly’s services-based MongoDB Codebase for this hybrid store architecture  Improved Performance by became hard to manage reducing average latency for High licensing, HW costs inserts from 400ms to 2ms. The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services
    20. 20. Content Management • Comments and user generated News Site content • Personalization of content, layoutMulti-Device • Generate layout on the fly for each rendering device that connects • No need to cache static pages • Store large objects Sharing • Simple modeling of metadata
    21. 21. Content Management Geo spatial indexing Flexible data model for location basedGridFS for large for similar, but searches object storage different objects { camera: ‚Nikon d4‛, location: [ -122.418333, 37.775 ] } { camera: ‚Canon 5d mkII‛, people: [ ‚Jim‛, ‚Carol‛ ], taken_on: ISODate("2012-03-07T18:32:35.002Z") } { origin: ‚‛, license: ‚Creative Commons CC0‛, size: { dimensions: [ 124, 52 ], units: ‚pixels‛ Horizontal scalability } for large data sets }
    22. 22. Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire text corpus – 3.5T of data in 20 billion records Problem Why MongoDB Impact Analyze a staggering amount of  Migrated 5 billion records in a  Reduced code by 75% data for a system build on single day with zero downtime compared to MySQL continuous stream of high-  MongoDB powers every  Fetch time cut from 400ms to quality text pulled from online website requests: 20m API calls 60ms sources per day  Sustained insert speed of 8k Adding too much data too  Ability to eliminated words per second, with quickly resulted in outages; memcached layer, creating a frequent bursts of up to 50k per tables locked for tens of simplified system that required second seconds during inserts fewer resources and was less  Significant cost savings and 15% Initially launched entirely on prone to error. reduction in servers MySQL but quickly hit performance road blocks Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller. Since we don’t spend time worrying about the database, we can spend more time writing code for our application. -Tony Tam, Vice President of Engineering and Technical Co-founder
    23. 23. www.10gen.comwww.mongodb.orgDwight Merriman, 10gen