MongoDB Common Use
      Cases
Emerging NoSQL Space

                  RDBMS                   RDBMS



   RDBMS

                  Data           Data
                 Warehou        Warehou           NoSQL
                   se             se



The beginning   Last 10 years             Today
Qualities of NoSQL
                    Workloads

Flexible data models      High Throughput          Large Data Sizes
• Lists, Nested Objects   • Lots of reads          • Aggregate data size
• Sparse schemas          • Lots of writes         • Number of objects
• Semi-structured data
• Agile Development



Low Latency               Cloud Computing          Commodity
• Both reads and writes   • Run anywhere           Hardware
• Millisecond latency     • No assumptions about   • Ethernet
                            hardware               • Local disks
                          • No / Few Knobs
MongoDB was designed for
            this

Flexible data models      High Throughput             Large Data Sizes
• Lists, Nested Objects   • Lots of reads             • Aggregate data size
      • schemas
• SparseJSON based             • writes
                          • Lots of Replica Sets to   • Number of objects shards
                                                           • 1000’s of
• Semi-structuredmodel
          object data           scale reads                 in a single DB
      • Dynamic
• Agile Development           • Sharding to               • Partitioning of
        schemas                 scale writes                data

Low Latency               Cloud Computing             Commodity
• Both reads and writes   • Run anywhere              Hardware
      • In-memory
• Millisecond latency     • No • Scale-out to
                               assumptions about      • Ethernet
                                                           • Designed for
      cache                        overcome
                            hardware                  • Local disks
                          • No / Few Knobs                   “typical” OS and
    • Scale-out                    hardware
                                                             local file system
      working set                limitations
Example customers
Content Management       Operational Intelligence     Product Data Management



        w




            User Data Management         High Volume Data Feeds
USE CASES THAT
LEVERAGE NOSQL
High Volume Data Feeds
  Machine      • More machines, more sensors, more
 Generated       data
   Data        • Variably structured


Stock Market   • High frequency trading
    Data

Social Media   • Multiple sources of data
 Firehose      • Each changes their format constantly
High Volume Data Feed
                              Flexible document
                              model can adapt to
                              changes in sensor
                                    format
   Asynchronous writes




 Data
  Data
Sources
    Data
 Sources
     Data                     Write to memory with
  Sources                      periodic disk flush
    Sources




          Scale writes over
           multiple shards
Operational Intelligence

               • Large volume of state about users
Ad Targeting   • Very strict latency requirements



               • Expose report data to millions of customers
 Real time     • Report on large volumes of data
dashboards     • Reports that update in real time



Social Media   • What are people talking about?
 Monitoring
Operational Intelligence
                                    Parallelize queries
               Low latency reads
                                   across replicas and
                                          shards




    API
                                      In database
                                      aggregation




Dashboards
                                    Flexible schema
                                   adapts to changing
                                       input data
Can use same cluster
to collect, store, and
   report on data
Behavioral Profiles
                                                               Rich profiles
                                                            collecting multiple
                                                             complex actions
1   See Ad

                Scale out to support   { cookie_id: “1234512413243”,
                 high throughput of      advertiser:{
                                            apple: {
                  activities tracked           actions: [
2   See Ad                                        { impression: ‘ad1’, time: 123 },
                                                  { impression: ‘ad2’, time: 232 },
                                                  { click: ‘ad2’, time: 235 },
                                                  { add_to_cart: ‘laptop’,
                                                     sku: ‘asdf23f’,
                                                     time: 254 },
    Click                                         { purchase: ‘laptop’, time: 354 }
3                                              ]
                                            }
                                         }
                                       }
                         Dynamic schemas
                        make it easy to track
                                                       Indexing and
4   Convert               vendor specific
                                                    querying to support
                            attributes
                                                    matching, frequency
                                                         capping
Product Data Management

E-Commerce
              • Diverse product portfolio
  Product     • Complex querying and filtering
  Catalog

              • Scale for short bursts of high-
                volume traffic
Flash Sales   • Scalable but consistent view of
                inventory
Content Management
               • Comments and user generated
 News Site       content
               • Personalization of content, layout

Multi-Device   • Generate layout on the fly for each
 rendering       device that connects
               • No need to cache static pages


               • Store large objects
  Sharing      • Simple modeling of metadata
Content Management
                                                                             Geo spatial indexing
                              Flexible data model                             for location based
GridFS for large
                                 for similar, but                                  searches
 object storage
                               different objects

                                                { camera: “Nikon d4”,
                                                  location: [ -122.418333, 37.775 ]
                                                }



                                                { camera: “Canon 5d mkII”,
                                                  people: [ “Jim”, “Carol” ],
                                                  taken_on: ISODate("2012-03-07T18:32:35.002Z")
                                                }


                                                { origin: “facebook.com/photos/xwdf23fsdf”,
                                                  license: “Creative Commons CC0”,
                                                  size: {
                                                     dimensions: [ 124, 52 ],
                                                     units: “pixels”
     Horizontal scalability                       }
      for large data sets                       }
User Data Management

  Video        • User state and session
  Games          management


   Social      • Scale out to large graphs
  Graphs       • Easy to search and process


  Identity • Authentication, Authorization,
Management   and Accounting
Social Graphs
 Native support for
Arrays makes it easy
to store connections
 inside user profile




                           Sharding partitions
                           user profiles across    Documents enable
            Social Graph    available servers       disk locality of all
                                                  profile data for a user
IS MY USE CASE A GOOD
FIT FOR MONGODB?
Good fits for MongoDB
Application Characteristic      Why MongoDB might be a good fit
Variable data in objects        Dynamic schema and JSON data model enable
                                flexible data storage without sparse tables or
                                complex joins
Low Latency Access              Memory Mapped storage engine caches
                                documents in RAM, enabling in-memory
                                performance. Data locality of documents can
                                significantly improve latency over join based
                                approaches
High write or read throughput   Sharding + Replication lets you scale read and
                                write traffic across multiple servers
Large number of objects to      Sharding lets you split objects across multiple
store                           servers
Cloud based deployment          Sharding and replication let you work around
                                hardware limitations in clouds.
THANK YOU!



Free online training available for MongoDB at:

http://education.10gen.com

Common MongoDB Use Cases

  • 1.
  • 2.
    Emerging NoSQL Space RDBMS RDBMS RDBMS Data Data Warehou Warehou NoSQL se se The beginning Last 10 years Today
  • 3.
    Qualities of NoSQL Workloads Flexible data models High Throughput Large Data Sizes • Lists, Nested Objects • Lots of reads • Aggregate data size • Sparse schemas • Lots of writes • Number of objects • Semi-structured data • Agile Development Low Latency Cloud Computing Commodity • Both reads and writes • Run anywhere Hardware • Millisecond latency • No assumptions about • Ethernet hardware • Local disks • No / Few Knobs
  • 4.
    MongoDB was designedfor this Flexible data models High Throughput Large Data Sizes • Lists, Nested Objects • Lots of reads • Aggregate data size • schemas • SparseJSON based • writes • Lots of Replica Sets to • Number of objects shards • 1000’s of • Semi-structuredmodel object data scale reads in a single DB • Dynamic • Agile Development • Sharding to • Partitioning of schemas scale writes data Low Latency Cloud Computing Commodity • Both reads and writes • Run anywhere Hardware • In-memory • Millisecond latency • No • Scale-out to assumptions about • Ethernet • Designed for cache overcome hardware • Local disks • No / Few Knobs “typical” OS and • Scale-out hardware local file system working set limitations
  • 5.
    Example customers Content Management Operational Intelligence Product Data Management w User Data Management High Volume Data Feeds
  • 6.
  • 7.
    High Volume DataFeeds Machine • More machines, more sensors, more Generated data Data • Variably structured Stock Market • High frequency trading Data Social Media • Multiple sources of data Firehose • Each changes their format constantly
  • 8.
    High Volume DataFeed Flexible document model can adapt to changes in sensor format Asynchronous writes Data Data Sources Data Sources Data Write to memory with Sources periodic disk flush Sources Scale writes over multiple shards
  • 9.
    Operational Intelligence • Large volume of state about users Ad Targeting • Very strict latency requirements • Expose report data to millions of customers Real time • Report on large volumes of data dashboards • Reports that update in real time Social Media • What are people talking about? Monitoring
  • 10.
    Operational Intelligence Parallelize queries Low latency reads across replicas and shards API In database aggregation Dashboards Flexible schema adapts to changing input data Can use same cluster to collect, store, and report on data
  • 11.
    Behavioral Profiles Rich profiles collecting multiple complex actions 1 See Ad Scale out to support { cookie_id: “1234512413243”, high throughput of advertiser:{ apple: { activities tracked actions: [ 2 See Ad { impression: ‘ad1’, time: 123 }, { impression: ‘ad2’, time: 232 }, { click: ‘ad2’, time: 235 }, { add_to_cart: ‘laptop’, sku: ‘asdf23f’, time: 254 }, Click { purchase: ‘laptop’, time: 354 } 3 ] } } } Dynamic schemas make it easy to track Indexing and 4 Convert vendor specific querying to support attributes matching, frequency capping
  • 12.
    Product Data Management E-Commerce • Diverse product portfolio Product • Complex querying and filtering Catalog • Scale for short bursts of high- volume traffic Flash Sales • Scalable but consistent view of inventory
  • 13.
    Content Management • Comments and user generated News Site content • Personalization of content, layout Multi-Device • Generate layout on the fly for each rendering device that connects • No need to cache static pages • Store large objects Sharing • Simple modeling of metadata
  • 14.
    Content Management Geo spatial indexing Flexible data model for location based GridFS for large for similar, but searches object storage different objects { camera: “Nikon d4”, location: [ -122.418333, 37.775 ] } { camera: “Canon 5d mkII”, people: [ “Jim”, “Carol” ], taken_on: ISODate("2012-03-07T18:32:35.002Z") } { origin: “facebook.com/photos/xwdf23fsdf”, license: “Creative Commons CC0”, size: { dimensions: [ 124, 52 ], units: “pixels” Horizontal scalability } for large data sets }
  • 15.
    User Data Management Video • User state and session Games management Social • Scale out to large graphs Graphs • Easy to search and process Identity • Authentication, Authorization, Management and Accounting
  • 16.
    Social Graphs Nativesupport for Arrays makes it easy to store connections inside user profile Sharding partitions user profiles across Documents enable Social Graph available servers disk locality of all profile data for a user
  • 17.
    IS MY USECASE A GOOD FIT FOR MONGODB?
  • 18.
    Good fits forMongoDB Application Characteristic Why MongoDB might be a good fit Variable data in objects Dynamic schema and JSON data model enable flexible data storage without sparse tables or complex joins Low Latency Access Memory Mapped storage engine caches documents in RAM, enabling in-memory performance. Data locality of documents can significantly improve latency over join based approaches High write or read throughput Sharding + Replication lets you scale read and write traffic across multiple servers Large number of objects to Sharding lets you split objects across multiple store servers Cloud based deployment Sharding and replication let you work around hardware limitations in clouds.
  • 19.
    THANK YOU! Free onlinetraining available for MongoDB at: http://education.10gen.com