SlideShare a Scribd company logo
1 of 26
How
Business Insider
      Uses
  MongoDB
  The NoSQL Approach in Practice
                                   Ian White
                                   GilbaneSF
                                   May 20th, 2010
Business Insider
        www.businessinsider.com



• Multiple business verticals (tech, markets)
• Dedicated fulltime editorial staff
• 3k original stories, 50k aggregated content,
  per month
• > 1.2m pageviews per day
• custom CMS powered by LAMP
LAMP
Linux    OS

Apache   Webserver

MySQL    Database

PHP      Application
LAMP
at Business Insider
  Linux     OS

  Apache    Webserver

  MongoDB   Database

  PHP       Application
What’s MongoDB?
• NoSQL Database
• Open Source (supported by 10gen)
• Document-oriented
• Schema-free
• Highly horizontally scalable
• Dynamically queryable
NoSQL Means

non-relational, next-generation
operational datastores and
databases
  • Document-oriented: CouchDB, MongoDB
  • Graph: Neo4J
  • Key/value: Cassandra, Redis
NoSQL
                 no joins
+ no complex transactions

 Horizontally Scalable
         Architectures
•
scalability & performance

                                memcached

                                 •   key/value




                                                                  •   RDBMS




                                         depth of functionality
Simplified Blog:
                a SQL approach
                       Three tables of rows:
      posts        posts_comments          posts_tags
      post_id      post_id                 post_id
      name         content                 name
      content

INSERT INTO posts (post_id, name, content) VALUES (42, “Lost Series
Finale Approaching!”, “<p>It’s going to be pretty exciting.</p>’);

INSERT INTO posts_comments (post_id, content) VALUES (42, “Cool!”);
INSERT INTO posts_comments (post_id, content) VALUES (42,
“Awesome!”);

INSERT INTO posts_tags (post_id, content) VALUES (42, ‘Lost’);
INSERT INTO posts_tags (post_id, content) VALUES (42,
‘Television’);
Simplified Blog:
 a MongoDB approach
             One collection of documents:
                       posts
                       _id
                       name
                       content
                       comments
                       tags

db.posts.insert( {
  _id: 42,
  name: “Lost Series Finale Approaching!”,
  content: “<p>It’s going to be pretty exciting.</p>”,
  comments: [ { content: “Cool!” },
               { content: “Awesome!” },
            ],
  tags: [“Lost”, “Television”]
} );
Simplified Blog:
            Queries
   Get a post and all its tags and comments

                        SQL
SELECT * FROM posts WHERE post_id = 42;
SELECT * FROM posts_comments WHERE post_id = 42;
SELECT * FROM posts_tags WHERE post_id = 42;


                     MongoDB
db.posts.findOne( {_id: 42} );
Simplified Blog:
            Queries
   Get all of the posts tagged a certain way

                        SQL
SELECT * FROM posts
JOIN posts_tags USING (post_id)
WHERE posts_tags.name = “Television”;


                     MongoDB
db.posts.find( { tags: “Television” } );
Easier Development
• Complex data structures (hashes, arrays)
  stored directly in their natural form
• No need for Object-Relational Mapping
• No need to worry about SQL injection
• Fewer collections/tables in system
• Easy for new developers to pick up
Easier Deployment

• ALTERs are a pain and require downtime
  on large datasets
• You don’t need ALTERs in MongoDB!
• Though occasionally still need migration
  scripts
Horizontal Scaling

• No JOINs encourages scalable practices
• Denormalization, scalable design gets baked
  in early
• Hard for a sane design NOT to scale
Replication
                           master   slave

        master
                           master   slave


slave      slave   slave   master   master

                           slave    master
Auto-sharding
                  Shards
          mongo   mongo    mongo
                                         ..
Config     mongo   mongo    mongo
Servers

mongod

mongod

mongod
                  mongos   mongos   ..


                  client
High Performance
• As fast as a cache when retrieving individual
  documents
• Limited use of caching (posts are pulled live
  from MongoDB)
• Every pageview on Business Insider does
  multiple writes
• Just using a simple master/slave, running
  about 5% capacity
Realtime Analytics
Realtime Analytics
• MongoDB is highly optimized for realtime
  updates
• Up-to-the-second data on pageviews,
  referrers, click tracking
• Minimal development time, huge value to
  editorial
• Could be bolted onto a SQL-based website
  or traditional CMS
Database File Storage
      (GridFS)

• Every image stored/served in the database
• Eliminates awkwardness/duplicate systems
  for replication, backups, test datasets, etc
Other Cool Stuff

• Map/Reduce
• Capped Collections
• Geospatial Indexes
• ... but we don’t use them (yet!)
Folks Using MongoDB
The Future (IMHO)
• NoSQL adoption grows rapidly
• Some sites use it for specific systems, some
  go all NoSQL
• Open-source CMSs support NoSQL
  (already happening)
• More diversity in datastores
• RDMS still useful but no longer the only
  option
Questions?

Ian White
iwhite@businessinsider.com
twitter.com/eonwhite

More Related Content

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

How Business Insider Uses MongoDB

  • 1. How Business Insider Uses MongoDB The NoSQL Approach in Practice Ian White GilbaneSF May 20th, 2010
  • 2.
  • 3. Business Insider www.businessinsider.com • Multiple business verticals (tech, markets) • Dedicated fulltime editorial staff • 3k original stories, 50k aggregated content, per month • > 1.2m pageviews per day • custom CMS powered by LAMP
  • 4. LAMP Linux OS Apache Webserver MySQL Database PHP Application
  • 5. LAMP at Business Insider Linux OS Apache Webserver MongoDB Database PHP Application
  • 6. What’s MongoDB? • NoSQL Database • Open Source (supported by 10gen) • Document-oriented • Schema-free • Highly horizontally scalable • Dynamically queryable
  • 7. NoSQL Means non-relational, next-generation operational datastores and databases • Document-oriented: CouchDB, MongoDB • Graph: Neo4J • Key/value: Cassandra, Redis
  • 8. NoSQL no joins + no complex transactions Horizontally Scalable Architectures
  • 9. • scalability & performance memcached • key/value • RDBMS depth of functionality
  • 10. Simplified Blog: a SQL approach Three tables of rows: posts posts_comments posts_tags post_id post_id post_id name content name content INSERT INTO posts (post_id, name, content) VALUES (42, “Lost Series Finale Approaching!”, “<p>It’s going to be pretty exciting.</p>’); INSERT INTO posts_comments (post_id, content) VALUES (42, “Cool!”); INSERT INTO posts_comments (post_id, content) VALUES (42, “Awesome!”); INSERT INTO posts_tags (post_id, content) VALUES (42, ‘Lost’); INSERT INTO posts_tags (post_id, content) VALUES (42, ‘Television’);
  • 11. Simplified Blog: a MongoDB approach One collection of documents: posts _id name content comments tags db.posts.insert( { _id: 42, name: “Lost Series Finale Approaching!”, content: “<p>It’s going to be pretty exciting.</p>”, comments: [ { content: “Cool!” }, { content: “Awesome!” }, ], tags: [“Lost”, “Television”] } );
  • 12. Simplified Blog: Queries Get a post and all its tags and comments SQL SELECT * FROM posts WHERE post_id = 42; SELECT * FROM posts_comments WHERE post_id = 42; SELECT * FROM posts_tags WHERE post_id = 42; MongoDB db.posts.findOne( {_id: 42} );
  • 13. Simplified Blog: Queries Get all of the posts tagged a certain way SQL SELECT * FROM posts JOIN posts_tags USING (post_id) WHERE posts_tags.name = “Television”; MongoDB db.posts.find( { tags: “Television” } );
  • 14. Easier Development • Complex data structures (hashes, arrays) stored directly in their natural form • No need for Object-Relational Mapping • No need to worry about SQL injection • Fewer collections/tables in system • Easy for new developers to pick up
  • 15. Easier Deployment • ALTERs are a pain and require downtime on large datasets • You don’t need ALTERs in MongoDB! • Though occasionally still need migration scripts
  • 16. Horizontal Scaling • No JOINs encourages scalable practices • Denormalization, scalable design gets baked in early • Hard for a sane design NOT to scale
  • 17. Replication master slave master master slave slave slave slave master master slave master
  • 18. Auto-sharding Shards mongo mongo mongo .. Config mongo mongo mongo Servers mongod mongod mongod mongos mongos .. client
  • 19. High Performance • As fast as a cache when retrieving individual documents • Limited use of caching (posts are pulled live from MongoDB) • Every pageview on Business Insider does multiple writes • Just using a simple master/slave, running about 5% capacity
  • 21. Realtime Analytics • MongoDB is highly optimized for realtime updates • Up-to-the-second data on pageviews, referrers, click tracking • Minimal development time, huge value to editorial • Could be bolted onto a SQL-based website or traditional CMS
  • 22. Database File Storage (GridFS) • Every image stored/served in the database • Eliminates awkwardness/duplicate systems for replication, backups, test datasets, etc
  • 23. Other Cool Stuff • Map/Reduce • Capped Collections • Geospatial Indexes • ... but we don’t use them (yet!)
  • 25. The Future (IMHO) • NoSQL adoption grows rapidly • Some sites use it for specific systems, some go all NoSQL • Open-source CMSs support NoSQL (already happening) • More diversity in datastores • RDMS still useful but no longer the only option

Editor's Notes