An Introduction to NoSQL:
  Theory and Practice

      Nosh Petigara
      nosh@10gen.com
      @noshinosh
Today
• Why new databases?


• What is „NoSQL‟?


• A brief introduction to MongoDB


• Some real-world use cases
Database Evolution

                   RDBMS
                     Oracle,
                    MySQL,
                   PostgreSQL




       OLAP                     NoSQL
        Netezza,                MongoDB,
        Vertica,                CouchDB,
        Hadoop                  Cassandra
Why NoSQL?

                • explosion of data and our desire to make
  Big Data        meaningful decisions from that data



    New
                • the existing data model is an impediment to
Programming       agile development.
   models

New Hardware    • The Cloud is starting to become the dominant
                  deployment architecture. Databases need to
 Architecture     take advantage of horizontal scaling capacity
Trends
What should my database be like?
               • Enable faster development cycle
    Agile      • Deal with structured and unstructured
                 data


               • Billions of objects, high read/write
  Scalable       volume, terabytes/petabytes



               • Cost effectively operationalize data in
 Cloud-ready     cloud-like environments
The Great Divide




     Sweet spot: Agile, Flexible, Scalable
How do I evaluate a NoSQL/BigData
Solution?

• Real-time vs. Batch

• Data Model

• Distribution Model + Consistency
Some comparisons
•   Realtime vs. Batch
    • Realtime: MongoDB, Cassandra, Membase, RDBMS
    • Batch: Hadoop, traditional data warehousing/BI


•   Data models
    •   Relational: Oracle, MySQL, etc
    • Key-value: Membase, Redis
    • Document: MongoDB, CouchDB
    • Column/Tabular: Cassandra


•   Distribution & consistency
    • Eventual consistency: Cassandra, Dynamo (S3, SimpleDB), RIak
    • Regular consistency: MongoDB, Oracle, etc
Some commonalities
• No Joins


• Relaxed transactional semantics


• No joins + simple transactions -> easier
  horizontal scalability
What to look for
• Can I model my data


• Can I query my data


• Can I update my data


• Does it support my operational needs
NoSQL in Practice: MongoDB
• Open source


• Non-relational, document-oriented


• Dynamic Schemas


• Regular consistency: Scale-out by auto-
 sharding (Similar to Google File System)
Data as Documents: A blog post
                                                    Primary key
{
_id:“A4304”
                                                      Simple values
author: “nosh”,
date: 22/6/2010,
title: “Intro to MongoDB”                                  Arrays
text: “MongoDB is an open source..”,
tags: [“webinar”, “opensource”]
comments: [{
                    author: “mike”,
                    date: 11/18/2010,
                    txt: “Did you see the…”,
                    votes: 7
              },….]
}
                                               Embedded documents
MongoDB is:
                Application
                                Document
                                Oriented
                              { author: “roger”,
                               date: new Date(),
                               text: “Spirited Away”,
                               tags: [“Tezuka”, “Manga”]}




 Horizontally Scalable
Photo Meta-Data

Problem:
• Store metadata for billions of photos and videos
• Business needed more flexibility than Oracle could deliver

Solution:
• Used MongoDB instead of Oracle


Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle

                  http://10gen.com/customers
Real-time Customer
                    Analytics
Problem:
• Track customer activity in real-time across huge
• Deal with massive data volume across all customer sites

Solution:
• Used MongoDB to replace Google Analytics / Omniture options

Results:
• Less than1week to build prototype and prove business case
• Rapid deployment of new features (1/day, 1/week)



                 http://10gen.com/customers
Data Archiving

Problem:
• Archive years of postings for compliance
• RDBMS could not handle evolving schemas

Solution:
• Used MongoDB to replace MySQL


Results:
• Less than1week to build prototype and prove business case
• Rapid deployment of new features (1/day, 1/week)


                 http://10gen.com/customers
Next Steps
• nosh@10gen.com
• @noshinosh


• http://mongodb.org
• http://10gen.com/presentations


• http://10gen.com/jobs
Next Sp
• Easy to start
• Easy to develop
• Easy to scale

Why Organizations are Looking at Alternative Database Technologies – Introduction to NoSQL

  • 1.
    An Introduction toNoSQL: Theory and Practice Nosh Petigara nosh@10gen.com @noshinosh
  • 2.
    Today • Why newdatabases? • What is „NoSQL‟? • A brief introduction to MongoDB • Some real-world use cases
  • 3.
    Database Evolution RDBMS Oracle, MySQL, PostgreSQL OLAP NoSQL Netezza, MongoDB, Vertica, CouchDB, Hadoop Cassandra
  • 4.
    Why NoSQL? • explosion of data and our desire to make Big Data meaningful decisions from that data New • the existing data model is an impediment to Programming agile development. models New Hardware • The Cloud is starting to become the dominant deployment architecture. Databases need to Architecture take advantage of horizontal scaling capacity
  • 5.
  • 6.
    What should mydatabase be like? • Enable faster development cycle Agile • Deal with structured and unstructured data • Billions of objects, high read/write Scalable volume, terabytes/petabytes • Cost effectively operationalize data in Cloud-ready cloud-like environments
  • 7.
    The Great Divide Sweet spot: Agile, Flexible, Scalable
  • 8.
    How do Ievaluate a NoSQL/BigData Solution? • Real-time vs. Batch • Data Model • Distribution Model + Consistency
  • 9.
    Some comparisons • Realtime vs. Batch • Realtime: MongoDB, Cassandra, Membase, RDBMS • Batch: Hadoop, traditional data warehousing/BI • Data models • Relational: Oracle, MySQL, etc • Key-value: Membase, Redis • Document: MongoDB, CouchDB • Column/Tabular: Cassandra • Distribution & consistency • Eventual consistency: Cassandra, Dynamo (S3, SimpleDB), RIak • Regular consistency: MongoDB, Oracle, etc
  • 10.
    Some commonalities • NoJoins • Relaxed transactional semantics • No joins + simple transactions -> easier horizontal scalability
  • 11.
    What to lookfor • Can I model my data • Can I query my data • Can I update my data • Does it support my operational needs
  • 12.
    NoSQL in Practice:MongoDB • Open source • Non-relational, document-oriented • Dynamic Schemas • Regular consistency: Scale-out by auto- sharding (Similar to Google File System)
  • 13.
    Data as Documents:A blog post Primary key { _id:“A4304” Simple values author: “nosh”, date: 22/6/2010, title: “Intro to MongoDB” Arrays text: “MongoDB is an open source..”, tags: [“webinar”, “opensource”] comments: [{ author: “mike”, date: 11/18/2010, txt: “Did you see the…”, votes: 7 },….] } Embedded documents
  • 14.
    MongoDB is: Application Document Oriented { author: “roger”, date: new Date(), text: “Spirited Away”, tags: [“Tezuka”, “Manga”]} Horizontally Scalable
  • 15.
    Photo Meta-Data Problem: • Storemetadata for billions of photos and videos • Business needed more flexibility than Oracle could deliver Solution: • Used MongoDB instead of Oracle Results: • Developed application in one sprint cycle • 500% cost reduction compared to Oracle • 900% performance improvement compared to Oracle http://10gen.com/customers
  • 16.
    Real-time Customer Analytics Problem: • Track customer activity in real-time across huge • Deal with massive data volume across all customer sites Solution: • Used MongoDB to replace Google Analytics / Omniture options Results: • Less than1week to build prototype and prove business case • Rapid deployment of new features (1/day, 1/week) http://10gen.com/customers
  • 17.
    Data Archiving Problem: • Archiveyears of postings for compliance • RDBMS could not handle evolving schemas Solution: • Used MongoDB to replace MySQL Results: • Less than1week to build prototype and prove business case • Rapid deployment of new features (1/day, 1/week) http://10gen.com/customers
  • 18.
    Next Steps • nosh@10gen.com •@noshinosh • http://mongodb.org • http://10gen.com/presentations • http://10gen.com/jobs
  • 19.
  • 20.
    • Easy tostart • Easy to develop • Easy to scale