MongoDB and NoSQL
Use Cases



Dwight Merriman, 10gen
Trends
• More data
• Complex data
• Cloud computing + Computer architecture
  trends ->
  – Many commodity-type servers rather than one
    large server; commodity-type storage
• Fast application start->deploy expectations ->
  – Agile software development methodologies /
    iteration
  – Service oriented architectures
Wants
• Horizontal scaling
• Ability to store complex data and deal with
  the malleability of real world schemas without
  pain
• Works with my (object-oriented) programming
  language without friction
• Works with my frequent release cycles
  (iteration) without friction
• High single server performance
• Cloud-friendly
Wants
• Horizontal scaling
• Ability to store complex data and deal with
  the malleability of real world schemas without
  pain                                  We Need
• Works with my (object-oriented) programming
  language without friction         A way to scale out
• Works with my frequent release cycles
  (iteration) without friction      A new data model
• High single server performance
• Cloud-friendly
Approach
• A new data model gives us a way to
  scale, and a way to solve our development
  wants

• Goals for data model:
  – Maintain data separation from code
  – Low friction and mapping cost our programming
    language
  – Malleability for adapting to constant changes of
    the real world
  – Ability to deal with polymorphic data
Approach
• Rich documents + Partitioning
  – Each document lives on one shard
    (partitioning)


• The catch:
  – No complex transactions
Approach
• Rich documents + Partitioning
  – Each document lives on one shard
    (partitioning)


• The catch:
  – No complex transactions
Wants
• Horizontal scaling
• Ability to store complex data and deal with
  the malleability of real world schemas
  without pain
• Works with my (object-oriented)
  programming language without friction
• Works with my frequent release cycles         Thus implying
  (iteration) without friction                  use cases…
• High single server performance
• Cloud-friendly



• Caveats / trade-offs:
    – No complex transactions
When should you consider
            using MongoDB?

• You find yourself coding around database performance issues – for
  example adding lots of caching.
• You are storing data in flat files.
• You are batch processing yet you need real-time.
• You are doing agile development, e.g., Scrum.
• Your data is complex to model in a relational db. e.g. a complex
  derivative security; electronic health records, ...
• Your project is late :-)
• You are forced to use expensive SANs, proprietary servers, or
  proprietary networks for your existing db.
• You are deploying to a public or private cloud.
When should you use
                 something else?

• Problems requiring SQL.
• Systems with a heavy emphasis on complex transactions such as banking
  systems and accounting.
• Traditional Non-Realtime Data Warehousing (sometimes). Traditional
  relational data warehouses and variants (columnar relational) are well suited
  for certain business intelligence problems – especially if you need SQL for
  your client tool (e.g. MicroStrategy). Exceptions where MongoDB is good
  are:
    • cases where the analytics are realtime
    • cases where the data very complicated to model in relational
    • when the data volume is huge
    • when the source data is already in a mongo database
RDBMS                     NoSQL
                                             DB


   RDBMS


                   Data           Data
                 Warehouse      Warehouse           RDBMS




The beginning   Last 10 years               Today
Example users
Content Management       Operational Intelligence      Meta Data Management



        w




            User Data Management         High Volume Data Feeds
High Volume Data Feeds
  Machine      • More machines, more sensors, more
 Generated       data
   Data        • Variably structured


Stock Market   • High frequency trading
    Data

Social Media   • Multiple sources of data
 Firehose      • Each changes their format constantly
Operational Intelligence

               • Large volume of state about users
Ad Targeting   • Very strict latency requirements



               • Expose report data to millions of customers
 Real time     • Report on large volumes of data
dashboards     • Reports that update in real time



Social Media   • What are people talking about?
 Monitoring
Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to
   derive interesting and actionable patterns from their customers’ website traffic

            Problem                          Why MongoDB                                Impact
 Intuit hosts more than 500,000      Intuit hosts more than 500,000       In one week Intuit was able to
  websites                             websites                              become proficient in MongoDB
 wanted to collect and analyze       wanted to collect and analyze         development
  data to recommend conversion         data to recommend conversion         Developed application features
  and lead generation                  and lead generation                   more quickly for MongoDB than
  improvements to customers.           improvements to customers.            for relational databases
 With 10 years worth of user         With 10 years worth of user          MongoDB was 2.5 times faster
  data, it took several days to        data, it took several days to         than MySQL
  process the information using a      process the information using a
  relational database.                 relational database.




   We did a prototype for one week, and within one week we had made big progress. Very big progress. It
   was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit
Marketing Personalization
                                                             Rich profiles
                                                          collecting multiple
                                                           complex actions
1   See Ad

              Scale out to support   { cookie_id: ‚1234512413243‛,
               high throughput of      advertiser:{
                                          apple: {
                activities tracked           actions: [
2   See Ad                                      { impression: ‘ad1’, time: 123 },
                                                { impression: ‘ad2’, time: 232 },
                                                { click: ‘ad2’, time: 235 },
                                                { add_to_cart: ‘laptop’,
                                                   sku: ‘asdf23f’,
                                                   time: 254 },
    Click                                       { purchase: ‘laptop’, time: 354 }
3                                            ]
                                          }
                                       }
                                     }
                       Dynamic schemas
                      make it easy to track
                                                     Indexing and
4   Convert             vendor specific
                                                  querying to support
                          attributes
                                                  matching, frequency
                                                       capping
Meta Data Management

  Data        • Meta data about artifacts
              • Content in the library
Archiving

Information   • Have data sources that you don’t have access to
              • Stores meta-data on those stores and figure out
 discovery      which ones have the content



              • Retina scans
Biometrics    • Finger prints
Meta data
                                                Indexing and rich
                                                query API for easy
                                              searching and sorting
    db.archives.
       find({ ‚country”: ‚Egypt‛ });

                                                   Flexible data model
                                                      for similar, but
                                                    different objects


{ type: ‚Artefact‛,         { ISBN: ‚00e8da9b‛,
  medium: ‚Ceramic‛,          type: ‚Book‛,
  country: ‚Egypt‛,           country: ‚Egypt‛,
  year: ‚3000 BC‛             title: ‚Ancient Egypt‛
}                           }
Shutterfly uses MongoDB to safeguard more than six billion images for millions of
  customers in the form of photos and videos, and turn everyday pictures into keepsakes

           Problem                           Why MongoDB                                  Impact
 Managing 20TB of data (six            JSON-based data structure             500% cost reduction and 900%
  billion images for millions of        Provided Shutterfly with an            performance improvement
  customers) partitioning by             agile, high                            compared to previous Oracle
  function.                              performance, scalable solution         implementation
 Home-grown key value store on          at a low cost.                        Accelerated time-to-market for
  top of their Oracle database          Works seamlessly with                  nearly a dozen projects on
  offered sub-par performance            Shutterfly’s services-based            MongoDB
 Codebase for this hybrid store         architecture                          Improved Performance by
  became hard to manage                                                         reducing average latency for
 High licensing, HW costs                                                      inserts from 400ms to 2ms.




   The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly
   an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and
   deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services
Content Management
               • Comments and user generated
 News Site       content
               • Personalization of content, layout

Multi-Device   • Generate layout on the fly for each
 rendering       device that connects
               • No need to cache static pages


               • Store large objects
  Sharing      • Simple modeling of metadata
Content Management
                                                                             Geo spatial indexing
                              Flexible data model                             for location based
GridFS for large
                                 for similar, but                                  searches
 object storage
                               different objects

                                                { camera: ‚Nikon d4‛,
                                                  location: [ -122.418333, 37.775 ]
                                                }



                                                { camera: ‚Canon 5d mkII‛,
                                                  people: [ ‚Jim‛, ‚Carol‛ ],
                                                  taken_on: ISODate("2012-03-07T18:32:35.002Z")
                                                }


                                                { origin: ‚facebook.com/photos/xwdf23fsdf‛,
                                                  license: ‚Creative Commons CC0‛,
                                                  size: {
                                                     dimensions: [ 124, 52 ],
                                                     units: ‚pixels‛
     Horizontal scalability                       }
      for large data sets                       }
Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire
                    text corpus – 3.5T of data in 20 billion records

          Problem                           Why MongoDB                                 Impact
 Analyze a staggering amount of       Migrated 5 billion records in a      Reduced code by 75%
  data for a system build on            single day with zero downtime         compared to MySQL
  continuous stream of high-           MongoDB powers every                 Fetch time cut from 400ms to
  quality text pulled from online       website requests: 20m API calls       60ms
  sources                               per day                              Sustained insert speed of 8k
 Adding too much data too             Ability to eliminated                 words per second, with
  quickly resulted in outages;          memcached layer, creating a           frequent bursts of up to 50k per
  tables locked for tens of             simplified system that required       second
  seconds during inserts                fewer resources and was less         Significant cost savings and 15%
 Initially launched entirely on        prone to error.                       reduction in servers
  MySQL but quickly hit
  performance road blocks



   Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller.
   Since we don’t spend time worrying about the database, we can spend more time writing code for our
   application. -Tony Tam, Vice President of Engineering and Technical Co-founder
www.10gen.com
www.mongodb.org


Dwight Merriman, 10gen

Nosql Now 2012: MongoDB Use Cases

  • 1.
    MongoDB and NoSQL UseCases Dwight Merriman, 10gen
  • 2.
    Trends • More data •Complex data • Cloud computing + Computer architecture trends -> – Many commodity-type servers rather than one large server; commodity-type storage • Fast application start->deploy expectations -> – Agile software development methodologies / iteration – Service oriented architectures
  • 3.
    Wants • Horizontal scaling •Ability to store complex data and deal with the malleability of real world schemas without pain • Works with my (object-oriented) programming language without friction • Works with my frequent release cycles (iteration) without friction • High single server performance • Cloud-friendly
  • 4.
    Wants • Horizontal scaling •Ability to store complex data and deal with the malleability of real world schemas without pain We Need • Works with my (object-oriented) programming language without friction A way to scale out • Works with my frequent release cycles (iteration) without friction A new data model • High single server performance • Cloud-friendly
  • 5.
    Approach • A newdata model gives us a way to scale, and a way to solve our development wants • Goals for data model: – Maintain data separation from code – Low friction and mapping cost our programming language – Malleability for adapting to constant changes of the real world – Ability to deal with polymorphic data
  • 6.
    Approach • Rich documents+ Partitioning – Each document lives on one shard (partitioning) • The catch: – No complex transactions
  • 7.
    Approach • Rich documents+ Partitioning – Each document lives on one shard (partitioning) • The catch: – No complex transactions
  • 8.
    Wants • Horizontal scaling •Ability to store complex data and deal with the malleability of real world schemas without pain • Works with my (object-oriented) programming language without friction • Works with my frequent release cycles Thus implying (iteration) without friction use cases… • High single server performance • Cloud-friendly • Caveats / trade-offs: – No complex transactions
  • 9.
    When should youconsider using MongoDB? • You find yourself coding around database performance issues – for example adding lots of caching. • You are storing data in flat files. • You are batch processing yet you need real-time. • You are doing agile development, e.g., Scrum. • Your data is complex to model in a relational db. e.g. a complex derivative security; electronic health records, ... • Your project is late :-) • You are forced to use expensive SANs, proprietary servers, or proprietary networks for your existing db. • You are deploying to a public or private cloud.
  • 10.
    When should youuse something else? • Problems requiring SQL. • Systems with a heavy emphasis on complex transactions such as banking systems and accounting. • Traditional Non-Realtime Data Warehousing (sometimes). Traditional relational data warehouses and variants (columnar relational) are well suited for certain business intelligence problems – especially if you need SQL for your client tool (e.g. MicroStrategy). Exceptions where MongoDB is good are: • cases where the analytics are realtime • cases where the data very complicated to model in relational • when the data volume is huge • when the source data is already in a mongo database
  • 12.
    RDBMS NoSQL DB RDBMS Data Data Warehouse Warehouse RDBMS The beginning Last 10 years Today
  • 13.
    Example users Content Management Operational Intelligence Meta Data Management w User Data Management High Volume Data Feeds
  • 14.
    High Volume DataFeeds Machine • More machines, more sensors, more Generated data Data • Variably structured Stock Market • High frequency trading Data Social Media • Multiple sources of data Firehose • Each changes their format constantly
  • 15.
    Operational Intelligence • Large volume of state about users Ad Targeting • Very strict latency requirements • Expose report data to millions of customers Real time • Report on large volumes of data dashboards • Reports that update in real time Social Media • What are people talking about? Monitoring
  • 16.
    Intuit relies ona MongoDB-powered real-time analytics tool for small businesses to derive interesting and actionable patterns from their customers’ website traffic Problem Why MongoDB Impact  Intuit hosts more than 500,000  Intuit hosts more than 500,000  In one week Intuit was able to websites websites become proficient in MongoDB  wanted to collect and analyze  wanted to collect and analyze development data to recommend conversion data to recommend conversion  Developed application features and lead generation and lead generation more quickly for MongoDB than improvements to customers. improvements to customers. for relational databases  With 10 years worth of user  With 10 years worth of user  MongoDB was 2.5 times faster data, it took several days to data, it took several days to than MySQL process the information using a process the information using a relational database. relational database. We did a prototype for one week, and within one week we had made big progress. Very big progress. It was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit
  • 17.
    Marketing Personalization Rich profiles collecting multiple complex actions 1 See Ad Scale out to support { cookie_id: ‚1234512413243‛, high throughput of advertiser:{ apple: { activities tracked actions: [ 2 See Ad { impression: ‘ad1’, time: 123 }, { impression: ‘ad2’, time: 232 }, { click: ‘ad2’, time: 235 }, { add_to_cart: ‘laptop’, sku: ‘asdf23f’, time: 254 }, Click { purchase: ‘laptop’, time: 354 } 3 ] } } } Dynamic schemas make it easy to track Indexing and 4 Convert vendor specific querying to support attributes matching, frequency capping
  • 18.
    Meta Data Management Data • Meta data about artifacts • Content in the library Archiving Information • Have data sources that you don’t have access to • Stores meta-data on those stores and figure out discovery which ones have the content • Retina scans Biometrics • Finger prints
  • 19.
    Meta data Indexing and rich query API for easy searching and sorting db.archives. find({ ‚country”: ‚Egypt‛ }); Flexible data model for similar, but different objects { type: ‚Artefact‛, { ISBN: ‚00e8da9b‛, medium: ‚Ceramic‛, type: ‚Book‛, country: ‚Egypt‛, country: ‚Egypt‛, year: ‚3000 BC‛ title: ‚Ancient Egypt‛ } }
  • 20.
    Shutterfly uses MongoDBto safeguard more than six billion images for millions of customers in the form of photos and videos, and turn everyday pictures into keepsakes Problem Why MongoDB Impact  Managing 20TB of data (six  JSON-based data structure  500% cost reduction and 900% billion images for millions of  Provided Shutterfly with an performance improvement customers) partitioning by agile, high compared to previous Oracle function. performance, scalable solution implementation  Home-grown key value store on at a low cost.  Accelerated time-to-market for top of their Oracle database  Works seamlessly with nearly a dozen projects on offered sub-par performance Shutterfly’s services-based MongoDB  Codebase for this hybrid store architecture  Improved Performance by became hard to manage reducing average latency for  High licensing, HW costs inserts from 400ms to 2ms. The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services
  • 21.
    Content Management • Comments and user generated News Site content • Personalization of content, layout Multi-Device • Generate layout on the fly for each rendering device that connects • No need to cache static pages • Store large objects Sharing • Simple modeling of metadata
  • 22.
    Content Management Geo spatial indexing Flexible data model for location based GridFS for large for similar, but searches object storage different objects { camera: ‚Nikon d4‛, location: [ -122.418333, 37.775 ] } { camera: ‚Canon 5d mkII‛, people: [ ‚Jim‛, ‚Carol‛ ], taken_on: ISODate("2012-03-07T18:32:35.002Z") } { origin: ‚facebook.com/photos/xwdf23fsdf‛, license: ‚Creative Commons CC0‛, size: { dimensions: [ 124, 52 ], units: ‚pixels‛ Horizontal scalability } for large data sets }
  • 23.
    Wordnik uses MongoDBas the foundation for its “live” dictionary that stores its entire text corpus – 3.5T of data in 20 billion records Problem Why MongoDB Impact  Analyze a staggering amount of  Migrated 5 billion records in a  Reduced code by 75% data for a system build on single day with zero downtime compared to MySQL continuous stream of high-  MongoDB powers every  Fetch time cut from 400ms to quality text pulled from online website requests: 20m API calls 60ms sources per day  Sustained insert speed of 8k  Adding too much data too  Ability to eliminated words per second, with quickly resulted in outages; memcached layer, creating a frequent bursts of up to 50k per tables locked for tens of simplified system that required second seconds during inserts fewer resources and was less  Significant cost savings and 15%  Initially launched entirely on prone to error. reduction in servers MySQL but quickly hit performance road blocks Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller. Since we don’t spend time worrying about the database, we can spend more time writing code for our application. -Tony Tam, Vice President of Engineering and Technical Co-founder
  • 24.

Editor's Notes

  • #13 In the beginning, there was RDBMS, and if you needed to store data, that was what you used. But RDBMS is performance critical, and BI workloads tended to suck up system resources. So we carved off the data warehouse as a place to store a copy of the operational data for use in analytical queries. This offloaded work from the RDBMS and bought us cycles to scale higher. Today, we’re seeing another split. There’s a new set of workloads that are saturating RDBMS, and these are being carved off into yet another tier of our data architecture: the NoSQL store.