SlideShare a Scribd company logo
1 of 54
#MongoDC




Schema Design
Gary J. Murakami, Ph.D.
Lead Engineer and Ruby Evangelist
Algorithms – Data Structures              xkcd.com




            =?
          Schema Design – Gary Murakami
Chess (Northwestern University)
   Larry Atkin & Dave Slate
          Schema Design – Gary Murakami
Agenda
• Working with documents
• Evolving a Schema
• Queries and Indexes
• Common Patterns




                 Schema Design – Gary Murakami
RDBMS                                MongoDB
Database        ➜ Database
Table           ➜ Collection
Row             ➜ Document
Index           ➜ Index
Join            ➜ Embedded Document
Foreign Key     ➜ Reference

Terminology
              Schema Design – Gary Murakami
Working with
Documents
Modeling Data
         Schema Design – Gary Murakami
Documents
Provide flexibility and
performance



         Schema Design – Gary Murakami
Normalized Data
         Schema Design – Gary Murakami
De-Normalized (embedded)
Data
         Schema Design – Gary Murakami
Relational Schema Design
Focus on data storage




          Schema Design – Gary Murakami
Document Schema Design
Focus on data use




         Schema Design – Gary Murakami
Schema Design
Considerations
• How do we manipulate the data?
   – Dynamic Ad-Hoc Queries
   – Atomic Updates
   – Map Reduce

• What are the access patterns of the application?
   – Read/Write Ratio
   – Types of Queries / Updates
   – Data life-cycle and growth rate




                      Schema Design – Gary Murakami
Data Manipulation
• Query Selectors
  – Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte
  – Vector: $in, $nin, $all, $size

• Atomic Update Operators
  – Scalar: $inc, $set, $unset
  – Vector: $push, $pop, $pull, $pushAll, $pullAll, $addToSet




                      Schema Design – Gary Murakami
Data Access
• Flexible Schemas
• Ability to embed complex data structures
• Secondary Indexes
• Multi-Key Indexes
• Aggregation Framework
   – $project, $match, $limit, $skip, $sort, $group, $unwind

• No Joins


                       Schema Design – Gary Murakami
Endgame tablebase
Play chess with God – Ken
        Thompson
       Schema Design – Gary Murakami
Getting Started
Library Management
Application
• Patrons
• Books
• Authors
• Publishers




               Schema Design – Gary J. Murakami
An Example
One to One Relations




         Schema Design – Gary Murakami
Modeling Patrons

                                              patron = {
patron = {                                      _id: "joe",
  _id: "joe",                                   name: "Joe Bookreader",
  name: "Joe Bookreader”                        address: {
}                                                  street: "123 Fake St. ",
                                                   city: "Faketon",
address = {                                        state: "MA",
  patron_id = "joe",                               zip: 12345
  street: "123 Fake St. ",                      }
  city: "Faketon",                            }
  state: "MA",
  zip: 12345
}



                             Schema Design – Gary Murakami
One to One Relations
• Mostly the same as the relational approach
• Generally good idea to embed “contains”
 relationships
• Document model provides a holistic
 representation of objects




                  Schema Design – Gary Murakami
An Example
One To Many Relations




         Schema Design – Gary Murakami
Modeling Patrons
patron = {
  _id: "joe",
  name: "Joe Bookreader",
  join_date: ISODate("2011-10-15"),
  addresses: [
    {street: "1 Vernon St.", city: "Newton", state: "MA", …},
    {street: "52 Main St.", city: "Boston", state: "MA", …}
  ]
}




                             Schema Design – Gary Murakami
Publishers and Books
• Publishers put out many books
• Books have one publisher




                 Schema Design – Gary Murakami
Book


MongoDB: The Definitive Guide,
By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English

Publisher: O’Reilly Media, CA




                           Schema Design – Gary Murakami
Modeling Books – Embedded
Publisher

book = {
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English",
  publisher: {
      name: "O’Reilly Media",
      founded: "1980",
      location: "CA"
  }
}


                       Schema Design – Gary Murakami
Modeling Books & Publisher
Relationship
publisher = {
  name: "O’Reilly Media",
  founded: "1980",
  location: "CA"
}

book = {
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English"
}



                       Schema Design – Gary Murakami
Publisher _id as a Foreign
Key
publisher = {
  _id: "oreilly",
  name: "O’Reilly Media",
  founded: "1980",
  location: "CA"
}

book = {
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English",
  publisher_id: "oreilly"
}

                       Schema Design – Gary Murakami
Book _id as a Foreign Key
publisher = {
  name: "O’Reilly Media",
  founded: "1980",
  location: "CA"
  books: [ "123456789", ... ]
}

book = {
  _id: "123456789",
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English"
}

                       Schema Design – Gary Murakami
Where Do You Put the Foreign
Key?
• Array of books inside of publisher
   – Makes sense when many means a handful of items
   – Useful when items have bound on potential growth

• Reference to single publisher on books
   – Useful when items have unbounded growth (unlimited # of
    books)
• SQL doesn’t give you a choice, no arrays




                     Schema Design – Gary Murakami
Longest “Database Endgame”
Mate
• Retrograde analysis
  – Distance to mate (DTM)
  – Distance to conversion (DTC)

• Tablebase analysis
  – 6 piece – 262 moves, KRNKNN
  – 7 piece – 517 moves, so far
     • estimate 2015




                    Schema Design – Gary Murakami
Another Example
One to Many Relations




         Schema Design – Gary Murakami
Books and Patrons
• Book can be checked out by one Patron at a time
• Patrons can check out many books (but not
 1000’s)




                  Schema Design – Gary Murakami
Modeling Checkouts
patron = {
  _id: "joe",
  name: "Joe Bookreader",
  join_date: ISODate("2011-10-15"),
  address: { ... }
}

book = {
  _id: "123456789",
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  ...
}



                       Schema Design – Gary Murakami
Modeling Checkouts
patron = {
  _id: "joe",
  name: "Joe Bookreader",
  join_date: ISODate("2011-10-15"),
  address: { ... },
  checked_out: [
     { _id: "123456789", checked_out: "2012-10-15" },
     { _id: "987654321", checked_out: "2012-09-12" },
     ...
  ]
}




                       Schema Design – Gary Murakami
Denormalization
Provides data locality
De-normalize for speed




                         Schema Design – Gary Murakami
Modeling Checkouts:
 Denormalized
patron = {
  _id: "joe",
  name: "Joe Bookreader",
  join_date: ISODate("2011-10-15"),
  address: { ... },
  checked_out: [
     { _id: "123456789",
        title: "MongoDB: The Definitive Guide",
        authors: [ "Kristina Chodorow", "Mike Dirolf" ],
        checked_out: ISODate("2012-10-15")
     },
     { _id: "987654321"
        title: "MongoDB: The Scaling Adventure",
                          ...
     }, ...
  ]
}
                           Schema Design – Gary Murakami
Referencing vs. Embedding
• Embedding is a bit like pre-joined data
• Document-level ops are easy for server to
 handle
• Embed when the 'many' objects always appear
 with (i.e. viewed in the context of) their parent
• Reference when you need more flexibility




                   Schema Design – Gary Murakami
An Example
Single Table Inheritance




         Schema Design – Gary Murakami
Single Table Inheritance

book = {
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  published_date: ISODate("2010-09-24"),
  kind: "loanable",
  locations: [ ... ],
  pages: 216,
  language: "English",
  publisher: {
      name: "O’Reilly Media",
      founded: "1980",
      location: "CA"
  }
}

                       Schema Design – Gary Murakami
Can We Solve Chess
One Day?
• Chess tablebase problem
  – search is not localized, poor cache performance, seeks
  – competitive chess programs often play worse

• Endgame database size
  – 5 piece – 7 GB compressed 75%
     • 157 MB Shredderbase – 1000x
     • 441 MB Shredderbase – 10,000x
  – 6 piece – 1.2 TB compressed
  – 7 piece – 70 TB estimated by 2015



                     Schema Design – Gary Murakami
An Example
Many to Many Relations




         Schema Design – Gary Murakami
Relational Approach
          Schema Design – Gary Murakami
Books and Authors
book = {
  title: "MongoDB: The Definitive Guide",
  authors = [
      { _id: "kchodorow", name: "K-Awesome" },
      { _id: "mdirolf", name: "Batman Mike" },
  ]
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English"
}

author = {
  _id: "kchodorow",
  name: "Kristina Chodorow",
  hometown: "New York"
}
                      Schema Design – Gary Murakami
An Example
Trees




         Schema Design – Gary Murakami
Parent Links

book = {
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English",
  category: "MongoDB"
}

category = { _id: MongoDB, parent: "Databases" }
category = { _id: Databases, parent: "Programming" }




                       Schema Design – Gary Murakami
Child Links

book = {
  _id: 123456789,
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English"
}

category = { _id: MongoDB, children: [ 123456789, … ] }
category = { _id: Databases, children: ["MongoDB", "Postgres"}
category = { _id: Programming, children: ["DB", "Languages"] }




                             Schema Design – Gary Murakami
Modeling Trees
• Parent Links
  - Each node is stored as a document
  - Contains the id of the parent
• Child Links
  - Each node contains the id’s of the children
  - Can support graphs (multiple parents / child)



                  Schema Design – Gary Murakami
Array of Ancestors
book = {
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English",
  categories: ["Programming", "Databases", "MongoDB” ]
}

book = {
  title: "MySQL: The Definitive Guide",
  authors: [ "Michael Kofler" ],
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English",
  parent: "MongoDB",
  ancestors: [ "Programming", "Databases", "MongoDB"]
}
                       Schema Design – Gary Murakami
An Example
Queues




         Schema Design – Gary Murakami
Book Document

book = {
  _id: 123456789,
  title: "MongoDB: The Definitive Guide",
  authors: [ "Kristina Chodorow", "Mike Dirolf" ],
  published_date: ISODate("2010-09-24"),
  pages: 216,
  language: "English",
  available: 3
}

db.books.findAndModify({
   query: { _id: 123456789, available: { "$gt": 0 } },
   update: { $inc: { available: -1 } }
})

                           Schema Design – Gary Murakami
It’s All About Your
Application
• Programs + Databases = Applications
• Your Schema is the impedance matcher
  –   Melds programming with MongoDB
  –   Capitalizes on both programming language & DB
  –   Maximizes power transfer – best of both working together
  –   Flexible for development and change
• Programs X MongoDB = Great Big Applications
• Play chess with God


                       Schema Design – Gary Murakami
It’s All About Your
Application
• Programs + Databases = Applications
• Your Schema is the impedance matcher
  –   Melds programming with MongoDB
  –   Capitalizes on both programming language & DB
  –   Maximizes power transfer – best of both working together
  –   Flexible for development and change
• Programs X MongoDB = Great Big Applications
• Play music with God – AAC / PAC


                       Schema Design – Gary Murakami
#MongoDC
                  "His pattern indicates
                  two-dimensional thinking.”
                  -Spock
                  Star Trek II:
                  The Wrath of Khan


Thank You
Gary J. Murakami, Ph.D.                        www.3dchessfederation.com
Lead Engineer and Ruby Evangelist, 10gen (the MongoDB
company)

More Related Content

Similar to Schema Design

Schema & Design
Schema & DesignSchema & Design
Schema & Design
MongoDB
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
aaronheckmann
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Schema Design
Schema Design Schema Design
Schema Design
MongoDB
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Schema design
Schema designSchema design
Schema design
MongoDB
 

Similar to Schema Design (20)

Schema & Design
Schema & DesignSchema & Design
Schema & Design
 
Schema Design
Schema DesignSchema Design
Schema Design
 
MongoDB San Francisco 2013: Schema design presented by Jason Zucchetto, Consu...
MongoDB San Francisco 2013: Schema design presented by Jason Zucchetto, Consu...MongoDB San Francisco 2013: Schema design presented by Jason Zucchetto, Consu...
MongoDB San Francisco 2013: Schema design presented by Jason Zucchetto, Consu...
 
Schema Design
Schema DesignSchema Design
Schema Design
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Schema Design
Schema Design Schema Design
Schema Design
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema Design
 
Schema Design
Schema DesignSchema Design
Schema Design
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema Design
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Schema design
Schema designSchema design
Schema design
 
Schema Design by Gary Murakami
Schema Design by Gary MurakamiSchema Design by Gary Murakami
Schema Design by Gary Murakami
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Jumpstart: Schema Design
Jumpstart: Schema DesignJumpstart: Schema Design
Jumpstart: Schema Design
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Schema Design

  • 1. #MongoDC Schema Design Gary J. Murakami, Ph.D. Lead Engineer and Ruby Evangelist
  • 2. Algorithms – Data Structures xkcd.com =? Schema Design – Gary Murakami
  • 3. Chess (Northwestern University) Larry Atkin & Dave Slate Schema Design – Gary Murakami
  • 4. Agenda • Working with documents • Evolving a Schema • Queries and Indexes • Common Patterns Schema Design – Gary Murakami
  • 5. RDBMS MongoDB Database ➜ Database Table ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference Terminology Schema Design – Gary Murakami
  • 7. Modeling Data Schema Design – Gary Murakami
  • 8. Documents Provide flexibility and performance Schema Design – Gary Murakami
  • 9. Normalized Data Schema Design – Gary Murakami
  • 10. De-Normalized (embedded) Data Schema Design – Gary Murakami
  • 11. Relational Schema Design Focus on data storage Schema Design – Gary Murakami
  • 12. Document Schema Design Focus on data use Schema Design – Gary Murakami
  • 13. Schema Design Considerations • How do we manipulate the data? – Dynamic Ad-Hoc Queries – Atomic Updates – Map Reduce • What are the access patterns of the application? – Read/Write Ratio – Types of Queries / Updates – Data life-cycle and growth rate Schema Design – Gary Murakami
  • 14. Data Manipulation • Query Selectors – Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte – Vector: $in, $nin, $all, $size • Atomic Update Operators – Scalar: $inc, $set, $unset – Vector: $push, $pop, $pull, $pushAll, $pullAll, $addToSet Schema Design – Gary Murakami
  • 15. Data Access • Flexible Schemas • Ability to embed complex data structures • Secondary Indexes • Multi-Key Indexes • Aggregation Framework – $project, $match, $limit, $skip, $sort, $group, $unwind • No Joins Schema Design – Gary Murakami
  • 16. Endgame tablebase Play chess with God – Ken Thompson Schema Design – Gary Murakami
  • 18. Library Management Application • Patrons • Books • Authors • Publishers Schema Design – Gary J. Murakami
  • 19. An Example One to One Relations Schema Design – Gary Murakami
  • 20. Modeling Patrons patron = { patron = { _id: "joe", _id: "joe", name: "Joe Bookreader", name: "Joe Bookreader” address: { } street: "123 Fake St. ", city: "Faketon", address = { state: "MA", patron_id = "joe", zip: 12345 street: "123 Fake St. ", } city: "Faketon", } state: "MA", zip: 12345 } Schema Design – Gary Murakami
  • 21. One to One Relations • Mostly the same as the relational approach • Generally good idea to embed “contains” relationships • Document model provides a holistic representation of objects Schema Design – Gary Murakami
  • 22. An Example One To Many Relations Schema Design – Gary Murakami
  • 23. Modeling Patrons patron = { _id: "joe", name: "Joe Bookreader", join_date: ISODate("2011-10-15"), addresses: [ {street: "1 Vernon St.", city: "Newton", state: "MA", …}, {street: "52 Main St.", city: "Boston", state: "MA", …} ] } Schema Design – Gary Murakami
  • 24. Publishers and Books • Publishers put out many books • Books have one publisher Schema Design – Gary Murakami
  • 25. Book MongoDB: The Definitive Guide, By Kristina Chodorow and Mike Dirolf Published: 9/24/2010 Pages: 216 Language: English Publisher: O’Reilly Media, CA Schema Design – Gary Murakami
  • 26. Modeling Books – Embedded Publisher book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" } } Schema Design – Gary Murakami
  • 27. Modeling Books & Publisher Relationship publisher = { name: "O’Reilly Media", founded: "1980", location: "CA" } book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English" } Schema Design – Gary Murakami
  • 28. Publisher _id as a Foreign Key publisher = { _id: "oreilly", name: "O’Reilly Media", founded: "1980", location: "CA" } book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly" } Schema Design – Gary Murakami
  • 29. Book _id as a Foreign Key publisher = { name: "O’Reilly Media", founded: "1980", location: "CA" books: [ "123456789", ... ] } book = { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English" } Schema Design – Gary Murakami
  • 30. Where Do You Put the Foreign Key? • Array of books inside of publisher – Makes sense when many means a handful of items – Useful when items have bound on potential growth • Reference to single publisher on books – Useful when items have unbounded growth (unlimited # of books) • SQL doesn’t give you a choice, no arrays Schema Design – Gary Murakami
  • 31. Longest “Database Endgame” Mate • Retrograde analysis – Distance to mate (DTM) – Distance to conversion (DTC) • Tablebase analysis – 6 piece – 262 moves, KRNKNN – 7 piece – 517 moves, so far • estimate 2015 Schema Design – Gary Murakami
  • 32. Another Example One to Many Relations Schema Design – Gary Murakami
  • 33. Books and Patrons • Book can be checked out by one Patron at a time • Patrons can check out many books (but not 1000’s) Schema Design – Gary Murakami
  • 34. Modeling Checkouts patron = { _id: "joe", name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... } } book = { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], ... } Schema Design – Gary Murakami
  • 35. Modeling Checkouts patron = { _id: "joe", name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", checked_out: "2012-10-15" }, { _id: "987654321", checked_out: "2012-09-12" }, ... ] } Schema Design – Gary Murakami
  • 36. Denormalization Provides data locality De-normalize for speed Schema Design – Gary Murakami
  • 37. Modeling Checkouts: Denormalized patron = { _id: "joe", name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], checked_out: ISODate("2012-10-15") }, { _id: "987654321" title: "MongoDB: The Scaling Adventure", ... }, ... ] } Schema Design – Gary Murakami
  • 38. Referencing vs. Embedding • Embedding is a bit like pre-joined data • Document-level ops are easy for server to handle • Embed when the 'many' objects always appear with (i.e. viewed in the context of) their parent • Reference when you need more flexibility Schema Design – Gary Murakami
  • 39. An Example Single Table Inheritance Schema Design – Gary Murakami
  • 40. Single Table Inheritance book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), kind: "loanable", locations: [ ... ], pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" } } Schema Design – Gary Murakami
  • 41. Can We Solve Chess One Day? • Chess tablebase problem – search is not localized, poor cache performance, seeks – competitive chess programs often play worse • Endgame database size – 5 piece – 7 GB compressed 75% • 157 MB Shredderbase – 1000x • 441 MB Shredderbase – 10,000x – 6 piece – 1.2 TB compressed – 7 piece – 70 TB estimated by 2015 Schema Design – Gary Murakami
  • 42. An Example Many to Many Relations Schema Design – Gary Murakami
  • 43. Relational Approach Schema Design – Gary Murakami
  • 44. Books and Authors book = { title: "MongoDB: The Definitive Guide", authors = [ { _id: "kchodorow", name: "K-Awesome" }, { _id: "mdirolf", name: "Batman Mike" }, ] published_date: ISODate("2010-09-24"), pages: 216, language: "English" } author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York" } Schema Design – Gary Murakami
  • 45. An Example Trees Schema Design – Gary Murakami
  • 46. Parent Links book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", category: "MongoDB" } category = { _id: MongoDB, parent: "Databases" } category = { _id: Databases, parent: "Programming" } Schema Design – Gary Murakami
  • 47. Child Links book = { _id: 123456789, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English" } category = { _id: MongoDB, children: [ 123456789, … ] } category = { _id: Databases, children: ["MongoDB", "Postgres"} category = { _id: Programming, children: ["DB", "Languages"] } Schema Design – Gary Murakami
  • 48. Modeling Trees • Parent Links - Each node is stored as a document - Contains the id of the parent • Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child) Schema Design – Gary Murakami
  • 49. Array of Ancestors book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", categories: ["Programming", "Databases", "MongoDB” ] } book = { title: "MySQL: The Definitive Guide", authors: [ "Michael Kofler" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", parent: "MongoDB", ancestors: [ "Programming", "Databases", "MongoDB"] } Schema Design – Gary Murakami
  • 50. An Example Queues Schema Design – Gary Murakami
  • 51. Book Document book = { _id: 123456789, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", available: 3 } db.books.findAndModify({ query: { _id: 123456789, available: { "$gt": 0 } }, update: { $inc: { available: -1 } } }) Schema Design – Gary Murakami
  • 52. It’s All About Your Application • Programs + Databases = Applications • Your Schema is the impedance matcher – Melds programming with MongoDB – Capitalizes on both programming language & DB – Maximizes power transfer – best of both working together – Flexible for development and change • Programs X MongoDB = Great Big Applications • Play chess with God Schema Design – Gary Murakami
  • 53. It’s All About Your Application • Programs + Databases = Applications • Your Schema is the impedance matcher – Melds programming with MongoDB – Capitalizes on both programming language & DB – Maximizes power transfer – best of both working together – Flexible for development and change • Programs X MongoDB = Great Big Applications • Play music with God – AAC / PAC Schema Design – Gary Murakami
  • 54. #MongoDC "His pattern indicates two-dimensional thinking.” -Spock Star Trek II: The Wrath of Khan Thank You Gary J. Murakami, Ph.D. www.3dchessfederation.com Lead Engineer and Ruby Evangelist, 10gen (the MongoDB company)

Editor's Notes

  1. In the filing cabinet model, the patient’s x-rays, checkups, and allergies are stored in separate drawers and pulled together (like an RDBMS)In the file folder model, we store all of the patient information in a single folder (like MongoDB)
  2. Flexibility – Ability to represent rich data structuresPerformance – Benefit from data locality
  3. Concrete example of typical blog in typical tionaltional normalized form
  4. Concrete example of typical blog using a document oriented de-normalized approach
  5. Tools for data manipulation
  6. Tools for data access
  7. Slow to get address data every time you query for a user. Requires an extra operation.
  8. Publisher is repeated for every book, data duplication!
  9. Publisher is better being a separate entity and having its own collection.
  10. Now to create a relation between the two entities, you can choose to reference the publisher from the book document.This is similar to the relational approach for this very same problem.
  11. OR: because we are using MongoDB and documents can have arrays you can choose to model the relation by creating and maintaining an array of books within each publisher entity.Careful with mutable, growing arrays. See next slide.
  12. Costly for a small number of books because to get the publisher
  13. And data locality provides speed
  14. Book’s kind attribute could be local or loanableNote that we have locations for loanable books but not for localNote that these two separate schemas can co-exist (loanable books / local books are both books)
  15. Kristine to update this graphic at some point
  16. Authors often use pseudonyms for a book even though it’s the same individualTo get books by a particular author: - get the author - get books that have that author id in array