SlideShare a Scribd company logo
1 of 39
Download to read offline
MongoDB Schema Design:
                        Insights and Tradeoffs


                                     Montse Medina
                                    COO,

Saturday, May 5, 12
Social content is useful
                  in context


Saturday, May 5, 12
Social context is
       useful in context
Saturday, May 5, 12
Algorithms
                             +
                      Infrastructure




Saturday, May 5, 12
Technology Stack




                                Apache Kafka

Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Relational vs. Document-
                   oriented
                                                        Users
                                                 { id: 1,
               Users            Graph              name: “Robert”,
                                                   from:[2],
              id       name    from   to
                                                   to: [5,20]}

                                            vs
                                1     5
              1       Robert
                                1     20         { id: 2,
              2       Monica                       name:”Monica”,
                                2     1
              3       Lucas                        from:[23],
                                2     5            to:[1,5]}
             ...        ...    ...    ...

                                                 ...



Saturday, May 5, 12
Find all the “to” edges for user 5
                       Graph
                  from      to
                                                               Users
                      1     5          Blocks          { id: 5,
                                                         name: “Robert”,

                                              vs
                      1     20                           from:[1,2,4],
                      2     1                            to: [1,20,3,7,2]}
                      2     5
                                                       1 disk se
                      3     4                                    ek
                                                       guarante
                      3     23                                  ed !
                                                  ny
                      3     12
                      4     5                  ma
                                           as s
                      ...   ...         lly s a
                                     tia eek
                              P  ten k s
                                o is           es!
                                   d      ”e dg
                                      “to
Saturday, May 5, 12
Advantages of doc-oriented schema
         •Avoid joins
         •Disk locality when fetching relations (everything
             is stored within a doc record)



          Considerations for schema design
        •N to Many relations == Lists
        •Denormalization is more common

Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Schema-less design
        {id: 1, network: Twitter, name: “Robert”,
         from:[2], to: [5,20], screenName: “robertE”}

        {id: 2, network: Facebook, name:”Maria”,
         from:[23], to:[1,5], likes: [“biking”, “hiking”]}
        ...



                                                            he sche maless
                                               L ev erage t         but put
                                                   ture of Mongo,
                                               na
                                                            n with ty p e s i n
                                                 p rotectio
                                                         you r code!

Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Read-Friendly

                      Case Study: Publishers & Subscribers




Saturday, May 5, 12
Read-Friendly Approach
                                       Hi!


                                             Hi!



                                 Hi!
       Post:
       { _id: postId,
       owner: ownerId,
       recipient: recipientId,
       text: “message”, ...}

Saturday, May 5, 12
Read-Friendly Approach
                                    db.posts.find({recipient: uid})



                                            Sharding Key:
                                                 recipient



                      Fast retrieval, easy sharding
                      Slow writes, enormous amount of storage


Saturday, May 5, 12
Write-Friendly

                      Case Study: Publishers & Subscribers




Saturday, May 5, 12
Write-Friendly Approach

                                 Hi!




        Post:
        { _id: postId,
         owner: oId,
         text: “message”, ...}

Saturday, May 5, 12
Write-Friendly Approach

                             db.posts.find({owner: {$in:user.from}})


                                            Sharding Key:
                                                   ?



                      Fast writes, slim storage
                      Slow reads, harder queries


Saturday, May 5, 12
Hybrid Approach

                      Case Study: Publishers & Subscribers




Saturday, May 5, 12
Hybrid Approach

                               Hi!




     Post:
     { _id: postId,
       owner: ownerId,
       recipients: [u1, u2, u3, u5],
       text: “message”, ...}


Saturday, May 5, 12
Hybrid Approach

                                db.posts.find({recipients: uId})


                                          Sharding Key:
                                              random :)



                        Fast writes, slim storage,
                        reasonable read speed



Saturday, May 5, 12
Random sharding is not
                     random!      t he
           Best -- Impossible for our data         ize disk
                                                nim of
                                             Mi e r
                                                  b r sha rd!
                                             num pe
                                             seeks
            Worse



           Optimal solution




Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Indexes
                                           Primary Key
                       link: {                                                                ral
                                                                                         atu e
                                                                                    a n f th
                                    _id: ObjectId(...),
                                    url: “www.jetlore.com”,
                                                                              has content”,
                                    title: “Jetlore is a search platform for social ad o
                                                                        ata ste
                                                                     r d t in             tId
                                                               you se i
                                    description: “...”
                                                                                     j ec
                                }                           If
                                                                  , u fault     Ob
                                                               PK de


                      link: {
                                 _id: “www.jetlore.com”,
                                 title: “Jetlore is a search platform for social content”,
                                 description: “...”
                            }



Saturday, May 5, 12
Indexes
              Augment your schema to enable the
                    most selective index
                                                                                       ount”
                                                                                 ik esC
                                                                         w “l
                         post: {
                                                               a ne                           ient
                                                                                                   s: 1
                                                                                                        ,
                                   _id: ObjectId(...),
                                   recipients: [...],    Add                          r ec ip
                                                                               ex ( {
                                   likes: [...],          fie ld!        r eInd
                                   likesCount: ...,              s.e nsu )
                                                                 p ost nt: -1}
                                   ...}                     db. Cou
                                                                   s
                                                             lik e


                      Want all posts that a user can view sorted by
                      the number of likes




Saturday, May 5, 12
Indexes
                      Make sure to use the proper index

                           db.posts.find({recipients: uId}).sort({date: -1})
                                                                                      ith
                                                                                   tw
                                                                               tes ()
                                                                          a y s lain
                           db.posts.ensureIndex({recipients: 1})       Alw exp
                           db.posts.ensureIndex({date: 1})



                                                   vs               date: -1
                           db.posts.ensureIndex({recipients: 1, date:1})




Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Concurrency
                         Try to avoid “save()” in drivers
                      thread1: { _id: u1,                    thread2: { _id: u1,
                                      name: “Robert”,                        name: “Bob”,
                                      from: [u2, u3]                         from: []
                                    }                                      }

                            db.users.update({_id: thread1._id}, {$set: {thread1.from}})

                        db.users.update({_id: thread2._id}, {$set: {name: thread2.name}})


                                                      …but!
                          db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)




Saturday, May 5, 12
Concurrency
       Atomic Commutative Operators

                               db.users.update({_id: u1}, {$pull {to: u2}})

                           db.posts.update({_id: pId}, {$inc: {likesCount: 1}})




                      When updating lists and counters, instead of
                                 using $set, rely on
                               $inc, $addToSet, $pull



Saturday, May 5, 12
Concurrency
                                No Transactions

          user1: { _id: u1,
                                          User1 wants to
                 to: [u2, u3],            unsubscribe from user2.
                 from: [...], ...}

          user2: { _id: u2,               Ideally we would update
                 to: [...],
                 from: [u1, ...], ...}
                                          both users in one
                                          transaction                  ur
                                                                    yo
                                                            ti t in
                                                         en e
                                                      lem c o d
                                                 I mp

Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Reducing collection size
                                   Name your fields with short
                                           names!

     post: {
                      owner: ObjectId,
                      messageText: “loving Jetlore”,
                      mediaUrl: “www.jetlore.com”,
                      mediaTitle: “Jetlore is a user analytics & search platform for social content”
                }
                                                       vs
     post: {
                      o: ObjectId,
                      t: “loving Jetlore”,
                      mu: “www.jetlore.com”,
                      mt: “Jetlore is a user analytics & search platform for social content”
                }


Saturday, May 5, 12
Outline
I. Schema design
II. Lessons learned for schema design
III. Things to remember about MongoDB
     ‣   Single lock

     ‣   ($or + sort) query doesn’t use indexes properly

     ‣   Indexes with 2 list fields

     ‣   Record iterators + update
Saturday, May 5, 12
$or & sort query doesn’t use the proper
                        index
            db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1})


                            db.posts.ensureIndex({recipients: 1, date: -1})

                              db.posts.ensureIndex({privacy: 1, date: -1})



                         Indexes with 2 list fields

       post: { _id: ObjectId(...),
              recipients: [...],
                                           db.posts.ensureIndex({recipients: 1, links: 1})
              links: [...],
             ... }



Saturday, May 5, 12
Record iterators +
                          updating
      var posts = db.posts.find().skip(n).limit(t)
      while (posts.hasNext()) {
        var post = posts.next()
        db.posts.update({_id: post._id}, {$set: {text: NewText}})
      }

                      Sort by a field that will not change
                         or rename the old collection

      var posts = db.posts.find().sort({date: 1}).skip(n).limit(t)

      db.posts.renameCollection(“oldPosts”)
      var posts = db.oldPosts.find().skip(n).limit(t)
      while (posts.hasNext()) {
        var post = posts.next()
        db.posts.update({_id: post._id}, {$set: {text: NewText}})
      }

Saturday, May 5, 12
The take aways

    I. What is more important?

        •      Writes: Optimize for easy inserts/updates

        •      Reads: Optimize for easy querying

    II. Denormalize to enable the most selective index

    III. Concurrency: design to leverage commutative
      operators


Saturday, May 5, 12
Thank you!
                      Try our tech


                               powered by




Saturday, May 5, 12

More Related Content

Recently uploaded

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligencePrecisely
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 

Recently uploaded (20)

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

  • 1. MongoDB Schema Design: Insights and Tradeoffs Montse Medina COO, Saturday, May 5, 12
  • 2. Social content is useful in context Saturday, May 5, 12
  • 3. Social context is useful in context Saturday, May 5, 12
  • 4. Algorithms + Infrastructure Saturday, May 5, 12
  • 5. Technology Stack Apache Kafka Saturday, May 5, 12
  • 6. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 7. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 8. Relational vs. Document- oriented Users { id: 1, Users Graph name: “Robert”, from:[2], id name from to to: [5,20]} vs 1 5 1 Robert 1 20 { id: 2, 2 Monica name:”Monica”, 2 1 3 Lucas from:[23], 2 5 to:[1,5]} ... ... ... ... ... Saturday, May 5, 12
  • 9. Find all the “to” edges for user 5 Graph from to Users 1 5 Blocks { id: 5, name: “Robert”, vs 1 20 from:[1,2,4], 2 1 to: [1,20,3,7,2]} 2 5 1 disk se 3 4 ek guarante 3 23 ed ! ny 3 12 4 5 ma as s ... ... lly s a tia eek P ten k s o is es! d ”e dg “to Saturday, May 5, 12
  • 10. Advantages of doc-oriented schema •Avoid joins •Disk locality when fetching relations (everything is stored within a doc record) Considerations for schema design •N to Many relations == Lists •Denormalization is more common Saturday, May 5, 12
  • 11. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 12. Schema-less design {id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”} {id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]} ... he sche maless L ev erage t but put ture of Mongo, na n with ty p e s i n p rotectio you r code! Saturday, May 5, 12
  • 13. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 14. Read-Friendly Case Study: Publishers & Subscribers Saturday, May 5, 12
  • 15. Read-Friendly Approach Hi! Hi! Hi! Post: { _id: postId, owner: ownerId, recipient: recipientId, text: “message”, ...} Saturday, May 5, 12
  • 16. Read-Friendly Approach db.posts.find({recipient: uid}) Sharding Key: recipient Fast retrieval, easy sharding Slow writes, enormous amount of storage Saturday, May 5, 12
  • 17. Write-Friendly Case Study: Publishers & Subscribers Saturday, May 5, 12
  • 18. Write-Friendly Approach Hi! Post: { _id: postId, owner: oId, text: “message”, ...} Saturday, May 5, 12
  • 19. Write-Friendly Approach db.posts.find({owner: {$in:user.from}}) Sharding Key: ? Fast writes, slim storage Slow reads, harder queries Saturday, May 5, 12
  • 20. Hybrid Approach Case Study: Publishers & Subscribers Saturday, May 5, 12
  • 21. Hybrid Approach Hi! Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...} Saturday, May 5, 12
  • 22. Hybrid Approach db.posts.find({recipients: uId}) Sharding Key: random :) Fast writes, slim storage, reasonable read speed Saturday, May 5, 12
  • 23. Random sharding is not random! t he Best -- Impossible for our data ize disk nim of Mi e r b r sha rd! num pe seeks Worse Optimal solution Saturday, May 5, 12
  • 24. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 25. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 26. Indexes Primary Key link: { ral atu e a n f th _id: ObjectId(...), url: “www.jetlore.com”, has content”, title: “Jetlore is a search platform for social ad o ata ste r d t in tId you se i description: “...” j ec } If , u fault Ob PK de link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” } Saturday, May 5, 12
  • 27. Indexes Augment your schema to enable the most selective index ount” ik esC w “l post: { a ne ient s: 1 , _id: ObjectId(...), recipients: [...], Add r ec ip ex ( { likes: [...], fie ld! r eInd likesCount: ..., s.e nsu ) p ost nt: -1} ...} db. Cou s lik e Want all posts that a user can view sorted by the number of likes Saturday, May 5, 12
  • 28. Indexes Make sure to use the proper index db.posts.find({recipients: uId}).sort({date: -1}) ith tw tes () a y s lain db.posts.ensureIndex({recipients: 1}) Alw exp db.posts.ensureIndex({date: 1}) vs date: -1 db.posts.ensureIndex({recipients: 1, date:1}) Saturday, May 5, 12
  • 29. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 30. Concurrency Try to avoid “save()” in drivers thread1: { _id: u1, thread2: { _id: u1, name: “Robert”, name: “Bob”, from: [u2, u3] from: [] } } db.users.update({_id: thread1._id}, {$set: {thread1.from}}) db.users.update({_id: thread2._id}, {$set: {name: thread2.name}}) …but! db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false) Saturday, May 5, 12
  • 31. Concurrency Atomic Commutative Operators db.users.update({_id: u1}, {$pull {to: u2}}) db.posts.update({_id: pId}, {$inc: {likesCount: 1}}) When updating lists and counters, instead of using $set, rely on $inc, $addToSet, $pull Saturday, May 5, 12
  • 32. Concurrency No Transactions user1: { _id: u1, User1 wants to to: [u2, u3], unsubscribe from user2. from: [...], ...} user2: { _id: u2, Ideally we would update to: [...], from: [u1, ...], ...} both users in one transaction ur yo ti t in en e lem c o d I mp Saturday, May 5, 12
  • 33. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 34. Reducing collection size Name your fields with short names! post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” } vs post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” } Saturday, May 5, 12
  • 35. Outline I. Schema design II. Lessons learned for schema design III. Things to remember about MongoDB ‣ Single lock ‣ ($or + sort) query doesn’t use indexes properly ‣ Indexes with 2 list fields ‣ Record iterators + update Saturday, May 5, 12
  • 36. $or & sort query doesn’t use the proper index db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1}) db.posts.ensureIndex({recipients: 1, date: -1}) db.posts.ensureIndex({privacy: 1, date: -1}) Indexes with 2 list fields post: { _id: ObjectId(...), recipients: [...], db.posts.ensureIndex({recipients: 1, links: 1}) links: [...], ... } Saturday, May 5, 12
  • 37. Record iterators + updating var posts = db.posts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) } Sort by a field that will not change or rename the old collection var posts = db.posts.find().sort({date: 1}).skip(n).limit(t) db.posts.renameCollection(“oldPosts”) var posts = db.oldPosts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) } Saturday, May 5, 12
  • 38. The take aways I. What is more important? • Writes: Optimize for easy inserts/updates • Reads: Optimize for easy querying II. Denormalize to enable the most selective index III. Concurrency: design to leverage commutative operators Saturday, May 5, 12
  • 39. Thank you! Try our tech powered by Saturday, May 5, 12