NoSQL
Databases
         &
Managing Big Data
Talking about
What is BIG Data
NoSQL
MongoDB
Future of BIG Data
@spf13

                  AKA
Steve Francia
15+ years building
the internet

  Father, husband,
  skateboarder



Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
Company behind MongoDB
Offices in NYC, Palo Alto, London & Dublin
100+ employees
Support, consulting, training
Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic

Well Funded: Sequoia, Union Square, Flybridge
What is
   BIG
    data   ?
2000
Google Inc
Today announced it has released
the largest search engine on the
Internet.

Google’s new index, comprising
more than 1 billion URLs
2008
Our indexing system for processing
links indicates that
we now count 1 trillion unique URLs

(and the number of individual web
pages out there is growing by
several billion pages per day).
Data Growth                                   1,000
1000



 750


                                                       500
 500


                                                250
 250
                                          120
                                  55
            4      10     24
       1
   0
    2000   2001   2002   2003   2004     2005   2006   2007   2008

                           Millions of URLs
An unprecedented
amount of data is
being created and is
accessible
What good is it if
we can’t utilize this
data?
?
What is
NoSQL
What is NoSQL?




Key / Value   Column   Graph   Document
Key-Value Stores
A mapping from a key to a value
The store doesn't know anything about the the
key or value
The store doesn't know anything about the
insides of the value
Operations :
•Set, get, or delete a key-value pair
Column-Oriented
            Stores
Like a relational store, but flipped around: all
data for a column is kept together
An index provides a means to get a column
value for a record
Operations:
 •Get, insert, delete records; updating fields
Streaming column data in and out of Hadoop
Graph Databases
Stores vertex-to-vertex edges
Operations:
 •Getting and setting edges
 •Sometimes possible to annotate vertices
 or edges
Query languages support finding paths
between vertices, subject to various
constraints
Document Stores
The store is a container for documents
Documents are made up of named fields
   (think object/array/dict/hash...)
Can query on any document field(s)
Operations:
•Insert and delete documents
•Update fields within documents
MySQL

Data Model     Columns    Key:Value     Columns   Documents Relational

            Eventual /    Eventual /
Consistency                             Strong      Strong       Strong
            Quorum        Quorum

               Multi-       Multi-      Single      Single       Single
Availability
               Master       Master      Master      Master       Master

                                                   Range or
Partitioning    Hash        Hash         Range                    N/A
                                                    Hash

                Thrift,    Native        Rest,      Native
  Query                                                           SQL
                CQL       Drivers (6)    Thrift   Drivers (12)
Introduction to
MongoDB
What do we want in
 an ideal world?
What do we want in
       an ideal world?
•Horizontal scaling
  •cloud compatible
  •works with standard
  servers
•Fast
•Development is easy
  •Features
  •The Right Data Model
  •Schema Agility
MongoDB philosophy
 Keep functionality when we can (key/value
 stores are great, but we need more)
 Non-relational (no joins) makes scaling
 horizontally practical
 Document data models are good
 Database technology should run anywhere
 virtualized, cloud, metal, etc
Under the hood
Written in C++
Runs nearly everywhere
Data serialized to BSON
Extensive use of memory-mapped files
i.e. read-through write-through
memory caching.
Database Landscape
Scalability & Performance


                            Memcached
                                             MongoDB



                                                   RDBMS


                                Depth of Functionality
“
MongoDB has the best
features of key/value
stores, document
databases and
relational databases
in one.
        John Nunemaker
Relational made normalized
     data look like this
                      Category
                  • Name
                  • Url




                           Article
       User       • Name
                                              Tag
• Name            • Slug             • Name
• Email Address   • Publish date     • Url
                  • Text




                     Comment
                  • Comment
                  • Date
                  • Author
Document databases make
normalized data look like this
                            Article
                     • Name
                     • Slug
                     • Publish date
        User         • Text
   • Name            • Author
   • Email Address
                         Comment[]
                      • Comment
                      • Date
                      • Author

                            Tag[]
                      • Value

                         Category[]
                      • Value
MongoD
  B
Start with an
              (or array, hash, dict, e

place1 = {

   name : "10gen HQ",

 address : "578 Broadway 7th Floor",

   city : "New York",

    zip : "10011",
   tags : [ "business", "awesome" ]
}
Inserting the record
    Initial Data Load


               > db.places.insert(place1)

> db.places.insert(place1)
Querying
{

    name : "10gen HQ",

 address : "134 5th Avenue 3rd Floor",

    city : "New York",

     zip : "10011",
   tags : [ "business", "awesome" ]
}

> db.places.findOne({ zip: "10011",
            tags: "awesome" })

> db.places.find({tags: "business" })
Nested Documents
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
   name : "10gen HQ",

 address : "578 Broadway 7th Floor",

   city : "New York",

    zip : "10011",
   tags : [ "business", "awesome" ],
     tips :   [{

        

    author : "Fred",

        

    date : "Sat Apr 25 2010 20:51:03",

        

    text : "Best Place Ever!"

    }]
}
Updating
> db.places.update(
  {name : "10gen HQ"},
  { $push :
     { tips :
         { author : "nosh",
           date : 6/26/2011, 
           text : "Office hours are great!"
         }
     }
  }
)
MongoDB
Use Cases
CMS / Blog
Needs:
• Business needed modern data store for rapid development and
  scale

Solution:
• Use PHP & MongoDB

Results:
• Real time statistics
• All data, images, etc stored together
  easy access, easy deployment, easy high availability
• No need for complex migrations
• Enabled very rapid development and growth
Photo Meta-Data
Problem:
• Business needed more flexibility than Oracle could deliver

Solution:
• Use MongoDB instead of Oracle

Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle
Customer Analytics
Problem:
• Deal with massive data volume across all customer sites

Solution:
• Use MongoDB to replace Google Analytics / Omniture options

Results:
• Less than one week to build prototype and prove business case
• Rapid deployment of new features
Archiving
Why MongoDB:
• Existing application built on MySQL
• Lots of friction with RDBMS based archive storage
• Needed more scalable archive storage backend
Solution:
• Keep MySQL for active data (100mil)
• MongoDB for archive (2+ billion)
Results:
• No more alter table statements taking over 2 months to run
• Sharding enabled horizontal scale
• Very happily looking at other places to use MongoDB
Online Dictionary
Problem:
• MySQL could not scale to handle their 5B+ documents

Solution:
• Switched from MySQL to MongoDB

Results:
• Massive simplification of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL
E-commerce
Problem:
• Multi-vertical E-commerce impossible to model (efficiently) in
  RDBMS

Solution:
• Switched from MySQL to MongoDB

Results:
•   Massive simplification of code base
•   Rapidly build, halving time to market (and cost)
•   Eliminated need for external caching system
•   50x+ performance improvement over MySQL
Tons more
   MongoDB casts a wide net

  people keep coming up with
 new and brilliant ways to use it
In Good Company




      and 1000s more
The
  Futureof
      BIGdata
What is BIG?
  BIG today is
normal tomorrow
Data Growth                                                 9,000
9000



6750


                                                                   4,400
4500


                                                           2,150
2250
                                                   1,000
                                             500
                         55     120   250
       1   4   10   24
  0
   2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

                              Millions of URLs
Data Growth                                                 9,000
9000



6750


                                                                   4,400
4500


                                                           2,150
2250
                                                   1,000
                                             500
                         55     120   250
       1   4   10   24
  0
   2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

                              Millions of URLs
2012
Generating over
250 Millions of
tweets per day
MongoDB enables
us to scale with
the redefinition
of BIG.
MongoDB
    High                           Easy
Performance                     Development
         { author : “steve”,
           date : new Date(),
           text : “About MongoDB...”,
           tags : [“tech”, “database”]}




   Horizontally Scalable
http://spf13.com
                           http://github.com/s
                           @spf13




Question
    download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com
NoSQL databases and managing big data

NoSQL databases and managing big data

  • 1.
    NoSQL Databases & Managing Big Data
  • 2.
    Talking about What isBIG Data NoSQL MongoDB Future of BIG Data
  • 3.
    @spf13 AKA Steve Francia 15+ years building the internet Father, husband, skateboarder Chief Solutions Architect @ responsible for drivers, integrations, web & docs
  • 4.
    Company behind MongoDB Officesin NYC, Palo Alto, London & Dublin 100+ employees Support, consulting, training Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic Well Funded: Sequoia, Union Square, Flybridge
  • 5.
    What is BIG data ?
  • 6.
    2000 Google Inc Today announcedit has released the largest search engine on the Internet. Google’s new index, comprising more than 1 billion URLs
  • 7.
    2008 Our indexing systemfor processing links indicates that we now count 1 trillion unique URLs (and the number of individual web pages out there is growing by several billion pages per day).
  • 8.
    Data Growth 1,000 1000 750 500 500 250 250 120 55 4 10 24 1 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 Millions of URLs
  • 9.
    An unprecedented amount ofdata is being created and is accessible
  • 10.
    What good isit if we can’t utilize this data?
  • 11.
  • 12.
    What is NoSQL? Key/ Value Column Graph Document
  • 13.
    Key-Value Stores A mappingfrom a key to a value The store doesn't know anything about the the key or value The store doesn't know anything about the insides of the value Operations : •Set, get, or delete a key-value pair
  • 14.
    Column-Oriented Stores Like a relational store, but flipped around: all data for a column is kept together An index provides a means to get a column value for a record Operations: •Get, insert, delete records; updating fields Streaming column data in and out of Hadoop
  • 15.
    Graph Databases Stores vertex-to-vertexedges Operations: •Getting and setting edges •Sometimes possible to annotate vertices or edges Query languages support finding paths between vertices, subject to various constraints
  • 16.
    Document Stores The storeis a container for documents Documents are made up of named fields (think object/array/dict/hash...) Can query on any document field(s) Operations: •Insert and delete documents •Update fields within documents
  • 17.
    MySQL Data Model Columns Key:Value Columns Documents Relational Eventual / Eventual / Consistency Strong Strong Strong Quorum Quorum Multi- Multi- Single Single Single Availability Master Master Master Master Master Range or Partitioning Hash Hash Range N/A Hash Thrift, Native Rest, Native Query SQL CQL Drivers (6) Thrift Drivers (12)
  • 18.
  • 19.
    What do wewant in an ideal world?
  • 20.
    What do wewant in an ideal world? •Horizontal scaling •cloud compatible •works with standard servers •Fast •Development is easy •Features •The Right Data Model •Schema Agility
  • 21.
    MongoDB philosophy Keepfunctionality when we can (key/value stores are great, but we need more) Non-relational (no joins) makes scaling horizontally practical Document data models are good Database technology should run anywhere virtualized, cloud, metal, etc
  • 22.
    Under the hood Writtenin C++ Runs nearly everywhere Data serialized to BSON Extensive use of memory-mapped files i.e. read-through write-through memory caching.
  • 23.
    Database Landscape Scalability &Performance Memcached MongoDB RDBMS Depth of Functionality
  • 24.
    “ MongoDB has thebest features of key/value stores, document databases and relational databases in one. John Nunemaker
  • 25.
    Relational made normalized data look like this Category • Name • Url Article User • Name Tag • Name • Slug • Name • Email Address • Publish date • Url • Text Comment • Comment • Date • Author
  • 26.
    Document databases make normalizeddata look like this Article • Name • Slug • Publish date User • Text • Name • Author • Email Address Comment[] • Comment • Date • Author Tag[] • Value Category[] • Value
  • 27.
  • 28.
    Start with an (or array, hash, dict, e place1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] }
  • 29.
    Inserting the record Initial Data Load > db.places.insert(place1) > db.places.insert(place1)
  • 30.
    Querying { name : "10gen HQ", address : "134 5th Avenue 3rd Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] } > db.places.findOne({ zip: "10011", tags: "awesome" }) > db.places.find({tags: "business" })
  • 31.
    Nested Documents { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ], tips : [{ author : "Fred", date : "Sat Apr 25 2010 20:51:03", text : "Best Place Ever!" }] }
  • 32.
    Updating > db.places.update( {name : "10gen HQ"}, { $push : { tips : { author : "nosh", date : 6/26/2011, text : "Office hours are great!" } } } )
  • 33.
  • 34.
    CMS / Blog Needs: •Business needed modern data store for rapid development and scale Solution: • Use PHP & MongoDB Results: • Real time statistics • All data, images, etc stored together easy access, easy deployment, easy high availability • No need for complex migrations • Enabled very rapid development and growth
  • 35.
    Photo Meta-Data Problem: • Businessneeded more flexibility than Oracle could deliver Solution: • Use MongoDB instead of Oracle Results: • Developed application in one sprint cycle • 500% cost reduction compared to Oracle • 900% performance improvement compared to Oracle
  • 36.
    Customer Analytics Problem: • Dealwith massive data volume across all customer sites Solution: • Use MongoDB to replace Google Analytics / Omniture options Results: • Less than one week to build prototype and prove business case • Rapid deployment of new features
  • 37.
    Archiving Why MongoDB: • Existingapplication built on MySQL • Lots of friction with RDBMS based archive storage • Needed more scalable archive storage backend Solution: • Keep MySQL for active data (100mil) • MongoDB for archive (2+ billion) Results: • No more alter table statements taking over 2 months to run • Sharding enabled horizontal scale • Very happily looking at other places to use MongoDB
  • 38.
    Online Dictionary Problem: • MySQLcould not scale to handle their 5B+ documents Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Eliminated need for external caching system • 20x performance improvement over MySQL
  • 39.
    E-commerce Problem: • Multi-vertical E-commerceimpossible to model (efficiently) in RDBMS Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Rapidly build, halving time to market (and cost) • Eliminated need for external caching system • 50x+ performance improvement over MySQL
  • 40.
    Tons more MongoDB casts a wide net people keep coming up with new and brilliant ways to use it
  • 41.
    In Good Company and 1000s more
  • 42.
  • 43.
    What is BIG? BIG today is normal tomorrow
  • 44.
    Data Growth 9,000 9000 6750 4,400 4500 2,150 2250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
  • 45.
    Data Growth 9,000 9000 6750 4,400 4500 2,150 2250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
  • 46.
  • 47.
    MongoDB enables us toscale with the redefinition of BIG.
  • 48.
    MongoDB High Easy Performance Development { author : “steve”, date : new Date(), text : “About MongoDB...”, tags : [“tech”, “database”]} Horizontally Scalable
  • 49.
    http://spf13.com http://github.com/s @spf13 Question download at mongodb.org We’re hiring!! Contact us at jobs@10gen.com

Editor's Notes