How
Business Insider
      Uses
  MongoDB
  The NoSQL Approach in Practice
                                   Ian White
  ...
Business Insider
        www.businessinsider.com



• Multiple business verticals (tech, markets)
• Dedicated fulltime edi...
LAMP
Linux    OS

Apache   Webserver

MySQL    Database

PHP      Application
LAMP
at Business Insider
  Linux     OS

  Apache    Webserver

  MongoDB   Database

  PHP       Application
What’s MongoDB?
• NoSQL Database
• Open Source (supported by 10gen)
• Document-oriented
• Schema-free
• Highly horizontall...
NoSQL Means

non-relational, next-generation
operational datastores and
databases
  • Document-oriented: CouchDB, MongoDB
...
NoSQL
                 no joins
+ no complex transactions

 Horizontally Scalable
         Architectures
•
scalability & performance

                                memcached

                                 •   key/value



...
Simplified Blog:
                a SQL approach
                       Three tables of rows:
      posts        posts_comme...
Simplified Blog:
 a MongoDB approach
             One collection of documents:
                       posts
               ...
Simplified Blog:
            Queries
   Get a post and all its tags and comments

                        SQL
SELECT * FROM...
Simplified Blog:
            Queries
   Get all of the posts tagged a certain way

                        SQL
SELECT * FRO...
Easier Development
• Complex data structures (hashes, arrays)
  stored directly in their natural form
• No need for Object...
Easier Deployment

• ALTERs are a pain and require downtime
  on large datasets
• You don’t need ALTERs in MongoDB!
• Thou...
Horizontal Scaling

• No JOINs encourages scalable practices
• Denormalization, scalable design gets baked
  in early
• Ha...
Replication
                           master   slave

        master
                           master   slave


slave   ...
Auto-sharding
                  Shards
          mongo   mongo    mongo
                                         ..
Config ...
High Performance
• As fast as a cache when retrieving individual
  documents
• Limited use of caching (posts are pulled li...
Realtime Analytics
Realtime Analytics
• MongoDB is highly optimized for realtime
  updates
• Up-to-the-second data on pageviews,
  referrers,...
Database File Storage
      (GridFS)

• Every image stored/served in the database
• Eliminates awkwardness/duplicate syste...
Other Cool Stuff

• Map/Reduce
• Capped Collections
• Geospatial Indexes
• ... but we don’t use them (yet!)
Folks Using MongoDB
The Future (IMHO)
• NoSQL adoption grows rapidly
• Some sites use it for specific systems, some
  go all NoSQL
• Open-sourc...
Questions?

Ian White
iwhite@businessinsider.com
twitter.com/eonwhite
How Business Insider Uses MongoDB
Upcoming SlideShare
Loading in...5
×

How Business Insider Uses MongoDB

7,387

Published on

1 Comment
12 Likes
Statistics
Notes
No Downloads
Views
Total Views
7,387
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
197
Comments
1
Likes
12
Embeds 0
No embeds

No notes for slide


























  • How Business Insider Uses MongoDB

    1. 1. How Business Insider Uses MongoDB The NoSQL Approach in Practice Ian White GilbaneSF May 20th, 2010
    2. 2. Business Insider www.businessinsider.com • Multiple business verticals (tech, markets) • Dedicated fulltime editorial staff • 3k original stories, 50k aggregated content, per month • > 1.2m pageviews per day • custom CMS powered by LAMP
    3. 3. LAMP Linux OS Apache Webserver MySQL Database PHP Application
    4. 4. LAMP at Business Insider Linux OS Apache Webserver MongoDB Database PHP Application
    5. 5. What’s MongoDB? • NoSQL Database • Open Source (supported by 10gen) • Document-oriented • Schema-free • Highly horizontally scalable • Dynamically queryable
    6. 6. NoSQL Means non-relational, next-generation operational datastores and databases • Document-oriented: CouchDB, MongoDB • Graph: Neo4J • Key/value: Cassandra, Redis
    7. 7. NoSQL no joins + no complex transactions Horizontally Scalable Architectures
    8. 8. • scalability & performance memcached • key/value • RDBMS depth of functionality
    9. 9. Simplified Blog: a SQL approach Three tables of rows: posts posts_comments posts_tags post_id post_id post_id name content name content INSERT INTO posts (post_id, name, content) VALUES (42, “Lost Series Finale Approaching!”, “<p>It’s going to be pretty exciting.</p>’); INSERT INTO posts_comments (post_id, content) VALUES (42, “Cool!”); INSERT INTO posts_comments (post_id, content) VALUES (42, “Awesome!”); INSERT INTO posts_tags (post_id, content) VALUES (42, ‘Lost’); INSERT INTO posts_tags (post_id, content) VALUES (42, ‘Television’);
    10. 10. Simplified Blog: a MongoDB approach One collection of documents: posts _id name content comments tags db.posts.insert( { _id: 42, name: “Lost Series Finale Approaching!”, content: “<p>It’s going to be pretty exciting.</p>”, comments: [ { content: “Cool!” }, { content: “Awesome!” }, ], tags: [“Lost”, “Television”] } );
    11. 11. Simplified Blog: Queries Get a post and all its tags and comments SQL SELECT * FROM posts WHERE post_id = 42; SELECT * FROM posts_comments WHERE post_id = 42; SELECT * FROM posts_tags WHERE post_id = 42; MongoDB db.posts.findOne( {_id: 42} );
    12. 12. Simplified Blog: Queries Get all of the posts tagged a certain way SQL SELECT * FROM posts JOIN posts_tags USING (post_id) WHERE posts_tags.name = “Television”; MongoDB db.posts.find( { tags: “Television” } );
    13. 13. Easier Development • Complex data structures (hashes, arrays) stored directly in their natural form • No need for Object-Relational Mapping • No need to worry about SQL injection • Fewer collections/tables in system • Easy for new developers to pick up
    14. 14. Easier Deployment • ALTERs are a pain and require downtime on large datasets • You don’t need ALTERs in MongoDB! • Though occasionally still need migration scripts
    15. 15. Horizontal Scaling • No JOINs encourages scalable practices • Denormalization, scalable design gets baked in early • Hard for a sane design NOT to scale
    16. 16. Replication master slave master master slave slave slave slave master master slave master
    17. 17. Auto-sharding Shards mongo mongo mongo .. Config mongo mongo mongo Servers mongod mongod mongod mongos mongos .. client
    18. 18. High Performance • As fast as a cache when retrieving individual documents • Limited use of caching (posts are pulled live from MongoDB) • Every pageview on Business Insider does multiple writes • Just using a simple master/slave, running about 5% capacity
    19. 19. Realtime Analytics
    20. 20. Realtime Analytics • MongoDB is highly optimized for realtime updates • Up-to-the-second data on pageviews, referrers, click tracking • Minimal development time, huge value to editorial • Could be bolted onto a SQL-based website or traditional CMS
    21. 21. Database File Storage (GridFS) • Every image stored/served in the database • Eliminates awkwardness/duplicate systems for replication, backups, test datasets, etc
    22. 22. Other Cool Stuff • Map/Reduce • Capped Collections • Geospatial Indexes • ... but we don’t use them (yet!)
    23. 23. Folks Using MongoDB
    24. 24. The Future (IMHO) • NoSQL adoption grows rapidly • Some sites use it for specific systems, some go all NoSQL • Open-source CMSs support NoSQL (already happening) • More diversity in datastores • RDMS still useful but no longer the only option
    25. 25. Questions? Ian White iwhite@businessinsider.com twitter.com/eonwhite
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×